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Preface 


Intended Audience 


This guide is intended for system managers who install and use HP Availability 
Manager software. It is assumed that the system managers who use this product 
are familiar with Microsoft Windows terms and functions. 


Note 


The term Windows as it is used in this manual refers to either Windows 
2000 or Windows XP but not to any other Windows product. 


Document Structure 


This guide contains the following chapters and appendixes: 


Chapter 1 provides an overview of Availability Manager software, including 
security features. 


Chapter 2 tells how to set up and configure the Data Analyzer and Data 
server, how to start the Data Server and Data Analyzer, use the main System 
Overview window, select a group of nodes and individual nodes, and use 
online help. 


Chapter 3 tells how to select nodes and display node data; it also explains 
what node data is. 


Chapter 4 tells how to display OpenVMS Cluster summary and detailed data; 
it also explains what cluster data is. 


Chapter 5 tells how to display and interpret events. 


Chapter 6 tells how to take a variety of corrective actions, called fixes, to 
improve system availability. 


Chapter 7 describes the tasks you can perform to filter, select, and customize 
the display of data and events. 


Appendix A contains a table of CPU process states that are referred to in 
Section 3.2.2.4 and in Section 3.3.1. 


Appendix B contains a table of OpenVMS and Windows events that can be 
displayed in the Event pane discussed in Chapter 5. 


Appendix C describes the events that can be signaled for each type of 
OpenVMS data that is collected. 
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Related Documents 
The following manuals provide additional information: 


e HP OpenVMS System Manager’s Manual describes tasks for managing 
an OpenVMS system. It also describes installing a product with the 
POLYCENTER Software Installation Utility. 


e HP OpenVMS System Management Utilities Reference Manual describes 
utilities you can use to manage an OpenVMS system. 


e HP OpenVMS Programming Concepts Manual explains OpenVMS lock 
management concepts. 


For additional information about HP OpenVMS products and services, see: 


http: //www.hp.com/go/openvms 


Reader’s Comments 


HP welcomes your comments on this manual. Please send comments to either of 
the following addresses: 
Internet openvmsdoc@hp.com 


Postal Mail Hewlett-Packard Company 
OSSG Documentation Group, ZKO3-4/U08 
110 Spit Brook Rd. 
Nashua, NH 03062-2698 


How to Order Additional Documentation 


For information about how to order additional documentation, see: 


http: //www.hp.com/go/openvms/doc/order 


Conventions 
The following conventions are used in this guide: 


Ctrl/x A sequence such as Ctrl/x indicates that you must hold down 
the key labeled Ctrl while you press another key or a pointing 
device button. 


PF1 x A sequence such as PF 1 x indicates that you must first press 
and release the key labeled PF1 and then press and release 
another key or a pointing device button. 


Return In examples, a key name enclosed in a box indicates that 
you press a key on the keyboard. (In text, a key name is not 
enclosed in a box.) 


In the HTML version of this document, this convention appears 
as brackets, rather than a box. 


XiV 


() 


{} 


bold type 


italic type 


UPPERCASE TYPE 


Example 


numbers 


A horizontal ellipsis in examples indicates one of the following 
possibilities: 


e Additional optional arguments in a statement have been 
omitted. 


e The preceding item or items can be repeated one or more 
times. 


e Additional parameters, values, or other information can be 
entered. 


A vertical ellipsis indicates the omission of items from a code 
example or command format; the items are omitted because 
they are not important to the topic being discussed. 


In command format descriptions, parentheses indicate that you 
must enclose choices in parentheses if you specify more than 
one. 


In command format descriptions, brackets indicate optional 
choices. You can choose one or more items or no items. 

Do not type the brackets on the command line. However, 
you must include the brackets in the syntax for OpenVMS 
directory specifications and for a substring specification in an 
assignment statement. 


In command format descriptions, vertical bars separate choices 
within brackets or braces. Within brackets, the choices are 
optional; within braces, at least one choice is required. Do not 
type the vertical bars on the command line. 


In command format descriptions, braces indicate required 
choices; you must choose at least one of the items listed. Do 
not type the braces on the command line. 


Bold type represents the introduction of a new term. It also 
represents the name of an argument, an attribute, or a reason. 


Italic type indicates important information, complete titles 
of manuals, or variables. Variables include information that 
varies in system output (Internal error number), in command 
lines (PRODUCER=name), and in command parameters in 
text (where dd represents the predefined code for the device 
type). 


Uppercase type indicates a command, the name of a routine, 
the name of a file, or the abbreviation for a system privilege. 


This typeface indicates code examples, command examples, and 
interactive screen displays. In text, this type also identifies 
URLs, UNIX commands and pathnames, PC-based commands 
and folders, and certain elements of the C programming 
language. 


A hyphen at the end of a command format description, 
command line, or code line indicates that the command or 
statement continues on the following line. 


All numbers in text are assumed to be decimal unless 
otherwise noted. Nondecimal radixes—hbinary, octal, or 
hexadecimal—are explicitly indicated. 
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Overview 


This chapter answers the following questions: 

e What is the HP Availability Manager? 

e How does the Availability Manager work? 

e How does the Availability Manager maintain security? 


e How does the Availability Manager identify possible performance problems? 


1.1 What Is the HP Availability Manager? 


The HP Availability Manager is a system management tool that allows you to 
monitor, from an OpenVMS or Windows node, one or more OpenVMS nodes on an 
extended local area network (LAN). 


The Availability Manager helps system managers and analysts target a specific 
node or process for detailed analysis. This tool collects system and process data 
from multiple OpenVMS nodes simultaneously, analyzes the data, and displays 
the output using a graphical user interface (GUI). 


Features and Benefits 


The Availability Manager offers many features that can help system managers 
improve the availability, accessibility, and performance of OpenVMS nodes and 
clusters. 


Feature Description 


Immediate notification Based on its analysis of data, the Availability Manager notifies 

of problems you immediately if any node you are monitoring is experiencing 
a performance problem, especially one that affects the node’s 
accessibility to users. At a glance, you can see whether a 
problem is a persistent one that warrants further investigation 
and correction. 


Centralized Provides centralized management of remote nodes within an 
management extended local area network (LAN). 
Intuitive interface Provides an easy-to-learn and easy-to-use graphical user 


interface (GUI). An earlier version of the tool, DECamds, uses 
a Motif GUI to display information about OpenVMS nodes. The 
Availability Manager uses a Java GUI to display information 
about OpenVMS nodes on an OpenVMS or a Windows node. 


Correction capability Allows real-time intervention, including adjustment of node and 
process parameters, even when remote nodes are hung. 
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Feature 


Description 


Uses its own protocol 


Customization 


Scalability 


An important advantage of the Availability Manager is that 

it uses its own network protocol. Unlike most performance 
monitors, the Availability Manager does not rely on TCP/IP 

or any other standard protocol. Therefore, even if a standard 
protocol is unavailable, the Availability Manager can continue to 
operate. 


Using a wide range of customization options, you can customize 
the Availability Manager to meet the requirements of your 
particular site. For example, you can change the severity levels 
of the events that are displayed and escalate their importance. 


Makes it easier to monitor multiple OpenVMS nodes. 


Figure 1-1 is an example of the initial System Overview window of the 


Availability Manager. 


Figure 1-1 System Overview Window 
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The System Overview window is divided into the following sections: 


e In the upper section of the display is a list of user-defined groups and a list 
of nodes in each group. You can compress the display to only the name of a 
group by clicking the handle preceding the group name. The summary group 
line remains, showing the collected information for all the nodes in the group. 


If a node name displays a red icon, you can hold the cursor over the icon, 
the node name, or the number in the Events column to display a tooltip 
explaining what the problem is; for example, for the node DBGAVC, the 
following message is displayed: 


HIHRDP, high hard page fault rate 


This section of the window is called the Group/Node pane. 
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In the lower section of the window events are posted, alerting you to possible 
problems on your system. The items on the pane vary, depending on the 
severity of the problem: the most severe problems are displayed first. This 
section of the window is called the Event pane. 


1.2 How Does the Availability Manager Work? 
The Availability Manager consists of the following parts: 


Data Collector 
Runs on OpenVMS nodes and has two purposes: 
e accepts requests for data from a Data Analyzer 


e allows a Data Analyzer or Data Server to communicate with other Data 
Collectors 


Data Analyzer 


Runs on an OpenVMS or Windows node. It displays collected data in an 
easy-to-use graphic user interface (GUD. 


Data Server 


Runs on an OpenVMS or Windows node. It allows the Data Collector and 
Data Analyzer to communicate over a wide area network (WAN) using the 
Internet Protocol (IP) suite. 


The next two sections describe how these parts work together on an extended 
LAN and on a WAN. 


1.2.1 Data Analyzer and Data Collector on the Same Extended LAN 


The Data Analyzer and Data Collector communicate over an extended LAN using 
an IEEE 802.3 Extended Packet format protocol. Once a connection between the 
Data Analyzer and the Data Collector is established, the Data Analyzer instructs 
the Data Collector to gather specific system and process data. 


Although the Data Analyzer can be run on a member of a monitored cluster, it is 
typically run on a system that is not a member of a monitored cluster. Because of 
this, the Data Analyzer does not hang if the cluster hangs. 


When the Data Analyzer and Data Collectors reside on the same extended 
LAN, they can communicate directly with each other. Restrictions on this direct 
communication setup are as follows: 


Only one Data Analyzer can run on a system at one time. 
Communication between the Data Analyzer and Data Collectors is not 
routable in an IP network. 
Note 


The Availability Manager protocol is based on the 802.3 Extended Packet 
Format (also known as SNAP). The IEEE Availability Manager protocol 
values are as follows: 


Protocol ID: 08-00-2B-80-48 
Multicast Address: 09-00-2B-02-01-09 


If your routers filter protocols in your network, add these values to your 
network protocols so that the private transport is propagated over the 
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1-4 Overview 


routers. 


Figure 1-2 shows a possible configuration of nodes running Data Analyzers and 
Data Collectors on an extended LAN. 


Figure 1-2 Availability Manager Node Configuration for an Extended LAN 
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In Figure 1-2, the Data Analyzer can monitor nodes A, B, and C across the 
network. The password on node D does not match the password of the Data 
Analyzer; therefore, the Data Analyzer cannot monitor node D. 


For information about password security, see Section 1.3. 


Requesting and Receiving Information over an Extended LAN 


After installing the Availability Manager software, you can begin to request 
information from Data Collectors on one or more nodes. 


Requesting and receiving information requires the Availability Manager to 
perform several steps, which are shown in Figure 1—3 and explained in the text 
following the figure. 
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Figure 1-3 Requesting and Receiving Information over an Extended LAN 
Windows MS 
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The following steps correspond to the numbers in Figure 1-3. 


@ The Data Analyzer passes a user’s request for data to the driver on the Data 
Analyzer node: 


e On Windows systems, the Windows driver is part of the Windows kit. 


e On OpenVMS systems, the OpenVMS driver is called the Data Collector 
driver and is included in the Data Collector kit. This is the same driver 
that is on the Data Collector node. 


® The driver on the Data Analyzer transmits the request across the network to 
the driver on the Data Collector node. 


© The driver on the Data Collector transmits the requested information as data 
over the network to the driver on the Data Analyzer node. 


© The driver on the Data Analyzer node passes the data to the Data Analyzer, 
which displays the data. 


In step 4, the Data Analyzer also checks the data against various thresholds and 
conditions, and posts events if the thresholds are exceeded or the conditions met. 
The following section explains how data analysis and event detection work. 

Data Collector Notes 

There are some characteristics to note about the Data Collector drivers on 
OpenVMS and Windows. 


e The Data Collector on a Data Collector node can collect data for more than 
one Data Analyzer node at the same time. 
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e The Data Collector driver on an OpenVMS Data Analyzer node can only 
support one Data Analyzer at a time. 


e The Data Collector driver on a Windows Data Analyzer node can only support 
one Data Analyzer connection to a network adapter at a time. 


1.2.2 Data Analyzer and Data Collector Connected over a WAN 


1-6 Overview 


The Data Analyzer can communicate only with Data Collectors that are on an 
extended LAN. (LANs are usually limited to a building or even just to a computer 
room.) However, you might need to run a Data Analyzer on a node that is not 
part of an extended LAN—for example, from home or at another site. To do this, 
you must add a Data Server node to your extended LAN. 


The purpose of the Data Server node is to relay data between the Data Analyzer 
and Data Collectors. The Data Server formats data for transport to and from the 
Data Analyzer over a WAN. 


Figure 1-4 shows an example of adding a Data Server and WAN connection to 
Figure 1-2. 


Figure 1-4 Availability Manager Node Configuration for a WAN 
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In Figure 1-4, the Data Analyzer monitors Data Collector nodes by passing data 
through the Data Server. When you start the Data Analyzer, you direct it to 
connect to the Data Server over the WAN. Once the connection is established, the 
Data Analyzer can connect to Data Collectors through the Data Server and start 
collecting data. 
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Requesting and Receiving Information over a WAN 

After installing the Availability Manager software, you can begin to request 
information from Data Collectors on one or more nodes. 

Requesting and receiving information requires the Availability Manager to 
perform several steps, which are shown in Figure 1—5 and explained in the text 
following the figure. 


Figure 1-5 Requesting and Receiving Information Over a WAN 
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The following steps correspond to the numbers in Figure 1-5. 


@ The Data Analyzer passes a user’s request for data to the IP socket connection 
on the Data Analyzer node. 


Using a secure socket, the IP socket transmits the request to the IP socket 
connection on the Data Server node. 


2) 
© The IP socket on the Data Server node passes the request to the Data Server. 
4) 


The Data Server passes the request to the IP socket on the Data Server 
node: 


e On Windows systems, the Windows driver is part of the Windows kit. 
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e On OpenVMS systems, the OpenVMS driver is called the Data Collector 
driver and is included in the Data Collector kit. This is the same driver 
that is on the Data Collector node. 


The driver on the Data Server transmits the request across the network to 
the driver on the Data Collector node. 


The driver on the Data Collector transmits the requested information as data 
over the network to the driver on the Data Server node. 


The driver on the Data Server node passes the data to the Data Server. 
The Data Server passes the data to the IP socket connection. 


The IP socket on the Data Server node transmits the data to the IP socket on 
the Data Analyzer node. 


@ The IP socket on the Data Analyzer node passes the data to the Data 
Analyzer, which displays the data. 


©oo0o8 oO @ 


In step 10, the Data Analyzer also checks the data for any events that need to 
be posted. The following section explains how data analysis and event detection 
work. 


Note 


More than one Windows or OpenVMS Data Analyzer node can connect to 
a Data Server node. 


1.3 How Does the Availability Manager Maintain Security? 


The Availability Manager uses passwords to maintain security. Passwords are 
eight alphanumeric characters long. The Data Analyzer stores passwords in its 
customization file. On OpenVMS Data Collector nodes, passwords are part of a 
three-part security code called a security triplet. 


The following sections explain these security methods further. 


1.3.1 Data Analyzer Password Security 
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For monitoring to take place, the password on a Data Analyzer node must match 
the password section of the security triplet on each OpenVMS Data Collector 
node. OpenVMS Data Collectors also impose other security measures, which are 
explained in Section 1.3.2. This password match is used whether or not a Data 
Server is involved in the connection between the Data Analyzer and the Data 
Collector. 


Figure 1—6 illustrates how you can use passwords to limit access to node 
information. 
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Figure 1-6 Availability Manager Password Matching 
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As shown in Figure 1-6, the Testing Department’s Data Analyzer, whose 
password is HOMERUNS, can access only OpenVMS Data Collector nodes with 
the HOMERUNS password as part of their security triplets. The same is true of 
the Accounting Department’s Data Analyzer, whose password is BATTERUP; it 
can access only OpenVMS Data Collector nodes with the BATTERUP password 
as part of their security triplets. 


The Availability Manager sets a default password when you install the Data 
Analyzer. To change that password, you must use the OpenVMS Security 
Customization page (see Figure 7—21), which is explained in Chapter 7. 


1.3.2 OpenVMS Data Collector Security 
OpenVMS Data Collector nodes have the following security features: 


e Availability Manager data-transfer security 


Each OpenVMS node running as a Data Collector has a file containing a list 
of security triplets. For Data Analyzer and Data Collector nodes to exchange 
data, the passwords on these nodes must match. 


In addition, the triplet specifies the type of access a Data Analyzer has. By 
specifying the hardware address of the Data Analyzer, the triplet can also 
restrict which Data Analyzer nodes are able to access the Data Collector. 


Section 1.3.3 explains security triplets and how to edit them. 
e Availability Manager security log 


An OpenVMS Data Collector logs all access denials and executed 

write instructions to the operator communications manager (OPCOM). 
Messages are displayed on all terminals that have OPCOM enabled (with 
the REPLY/ENABLE command). OPCOM also puts messages in the 
SYS$MANAGER:OPERATOR.LOG file. 


Each security log entry contains the network address of the initiator. If 
access is denied, the log entry also indicates whether a read or write was 
attempted. If a write operation was performed, the log entry indicates the 
process identifier (PID) of the affected process. 
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e OpenVMS file protection and process privileges 


When the Availability Manager is installed, it creates a directory 
(SYS$COMMON:|[AMDS$AM]) and sets directory and file protections on 

it so that only the SYSTEM account can read the files in that directory. For 
additional security on these system-level directories and files, you can create 
access control lists (ACLs) to restrict and set alarms on write access to the 
security files. 


For more information about creating ACLs, see the HP OpenVMS Guide to 
System Security. 


1.3.3 Changing Security Triplets on OpenVMS Data Collector Nodes 


To change security triplets on an OpenVMS Data Collector node, you must edit 
the AMDS$DRIVER_ACCESS.DAT file, which is installed on all Data Collector 
nodes. The following sections explain what a security triplet is, how the Data 
Collector uses it, and how to change it. 


1.3.3.1 Understanding OpenVMS Security Triplets 


A security triplet determines which nodes can access system data from an 
OpenVMS Data Collector node. The AMDS$DRIVER_ACCESS.DAT file on 
OpenVMS Data Collector nodes lists security triplets. 


On OpenVMS Data Collector nodes, the AMDS$AM_CONFIG logical translates to 
the location of the default security file, AMDS$DRIVER_ACCESS.DAT. This file 
is installed on all OpenVMS Data Collector nodes. 


A security triplet is a three-part record whose fields are separated by backslashes 
(\ ). A triplet consists of the following fields: 


e A network address (hardware address or wildcard character) 


e An 8-character alphanumeric password 


The password is not case sensitive (so the passwords “testtest” and 
“TESTTEST” are considered to be the same). 


e A read, write, or control (R, W, or C) access verification code 


The exclamation point (!) is a comment delimiter; any characters to the right of 
the comment delimiter are ignored. 


Example 


All Data Collector nodes in group FINANCE have the following AMDS$DRIVER_ 
ACCESS.DAT file: 


*\ FINGROUP\R ! Let anyone with FINGROUP password read 
! 


2.1\DEVGROUP\W ! Let only DECnet node 2.1 with 
! DEVGROUP password perform fixes (writes) 


1.3.3.2 How to Change a Security Triplet 
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Note 


The configuration files for DECamds and the Availability Manager are 
separate; only one set is used, depending on which startup command 
procedure you use to start the driver. 


For more information about the configuration file setup for both 
DECamds and the Availability Manager, see the HP Availability Manager 
Installation Instructions. 
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On each Data Collector node on which you want to change security, you must 
edit the AMDS$DRIVER_ACCESS.DAT file. The data in the AMDS$DRIVER_ 
ACCESS.DAT file is set up as follows: 


Network address\password\access 


Use a backslash character (\ ) to separate the three fields. 
To edit the AMDS$DRIVER_ACCESS.DAT file, follow these steps: 
1. Edit the network address. 


The network address can be either of the following: 


Hardware address 


The hardware address field is the physical hardware address in the LAN 
device chip. It is used if you have multiple LAN devices or are running 
the HP DECnet-Plus for OpenVMS networking software on the system 
(not the HP DECnet Phase IV for OpenVMS networking software). 


For devices provided by HP, the hardware address is in the form 08- 
00-2B-xx-xx-xx, where the 08-00-2B portion is HP’s valid range of LAN 
addresses as defined by the IEEE 802 standards, and the xx-xx-xx portion 
is chip specific. 


To determine the value of the hardware address on a node, use the 
OpenVMS System Dump Analyzer (SDA) as follows: 


$ ANALYZE/SYSTEM 
SDA> SHOW LAN 


These commands display a list of available devices. Choose the template 
device of the LAN device you will be using, and then enter the following 
command: 


SDA> SHOW LAN/DEVICE=xxA0 
Wildcard address 


The wildcard character (*) allows any incoming triplet with a matching 
password field to access the Data Collector node. Use the wildcard 
character to allow read access and to run the console application from any 
node in your network. 


Caution: Use of the wildcard character for write-access or control- 
access security triplets enables any person using that node to perform 
system-altering fixes. 


2. Edit the password field. 


The password field must be an 8-byte alphanumeric field. The Availability 
Manager forces upper-case on the password, so "aaaaaaaa" and "AAAAAAAA" 
are essentially the same password to the Data Collector. 


The password field gives you a second level of protection when you want to 
use the wildcard address denotation to allow multiple modes of access to your 
monitored system. 


3. Enter R, W, or C as an access code: 


R means READONLY access to the Data Analyzer. 


W means READ/WRITE access to the Data Analyzer. (WRITE implies 
READ.) 
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e Cmeans CONTROL access to the Data Analyzer. CONTROL allows you 
to manipulate objects from which data are derived. (CONTROL implies 
both WRITE and READ.) 


The following security triplets are all valid; an explanation follows the 
exclamation point (!). 


*\ldecamds\r ! Anyone with password "1decamds" can monitor 

*\ldecamds\w ! Anyone with password "ldecamds" can monitor or write 
2.1\ldecamds\r ! Only node 2.1 with password "ldecamds" can monitor 
2.1\ldecamds\w ! Only node 2.1 with password "ldecamds" can monitor and 
write 

08-00-2b-03-23-cd\ldecamds\w ! Allows a particular hardware address to 
write 

08-00-2b-03-23-cd\ldecamds\r ! Allows a particular hardware address to read 
node 


OpenVMS Data Collector nodes accept more than one password. Therefore, you 
might have several security triplets in an AMDS$DRIVER_ACCESS.DAT file for 
one Data Collector node. For example: 


*\ 1DECAMDS\R 
*\ KOINECLS\R 
*\ KOINEFIX\W 
*\ AVAILMAN\C 


In this example, Data Analyzer nodes with the passwords IDECAMDS and 
KOINECLS are able to see the Data Collector data, but only the Data Analyzer 
node with the KOINEFIX password is able to write or change information, 
including performing fixes, on the Data Collector node. The Data Analyzer node 
with the AVAILMAN password is able to perform switched LAN fixes and other 
control functions. 


You can choose to set up your AMDS$DRIVER_ACCESS.DAT file to allow anyone 
on the local LAN to read from your system, but to allow only certain nodes to 
write or change process or device characteristics on your system. For example: 


*\ 1DECAMDS\R 
08-00-2B-03-23-CD\2NODEFIX\C 


In this example, any Data Analyzer node using the IDECAMDS password can 
read data from your system. However, only the Data Analyzer node with the 
hardware address 08-00-2B-03-23-CD and the password 2NODEFIX can perform 
fixes and other control functions. 


Note 


After editing the AMDS$DRIVER_ACCESS.DAT file, you must stop and 
then restart the Data Collector. This action loads the new data into the 
driver. 


1.3.4 Processing Security Triplets 
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The Availability Manager performs these steps when using security triplets to 
ensure security among Data Analyzer and Data Collector nodes: 


1. A multicast “Hello” message is broadcast at regular intervals to all nodes 
within the LAN indicating the availability of a Data Collector node to 
communicate with a Data Analyzer node. 
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2. The node running the Data Analyzer receives the message, returns a 
password to the Data Collector, and requests system data from the Data 
Collector. 


3. The password and network address of the Data Analyzer are used to search 
the security triplets in the AMDS$DRIVER_ACCESS.DAT file. 


e Ifthe Data Analyzer password and network address match one of the 
security triplets on the Data Collector, then the Data Collector and the 
Data Analyzer can exchange information. 


e Ifthe Data Analyzer password and network address do not match any 
of the security triplets, then access is denied and a message is logged 
to OPCOM. (See Table 1—2 for more information on logging this type of 
message.) In addition, the Data Analyzer receives a message stating that 
access to that node is not permitted. 


Table 1—1 describes how the Data Collector node interprets a security triplet 
match. 


Table 1-1 Security Triplet Verification 


Security Triplet Interpretation 


08-00-2B-12-34-56\ HOMETOWN \ W The Data Analyzer has write access to the node 
only when the Data Analyzer is run from a node 
with this hardware address (multiadapter or 
DECnet-Plus system) and with the password 
HOMETOWN. 


2.1\ HOMETOWN\R The Data Analyzer has read access to the 
node when run from a node with DECnet 
for OpenVMS Phase IV address 2.1 and the 
password HOMETOWN. 


*\ HOMETOWN\R Any Data Analyzer with the password 
HOMETOWN has read access to the node. 


Sending Messages to OPCOM 


The logical names shown in Table 1—2 control the sending of messages to OPCOM 
and are defined in the AMDS$LOGICALS.COM file on the Data Collector node. 


Table 1-2  DECamds Logical Names for OPCOM Messages 


AMDS$RM_OPCOM_READ A value of TRUE logs read failures to OPCOM. 
AMDS$RM_OPCOM_WRITE A value of TRUE logs write failures to OPCOM. 


To put these changes into effect, restart the Data Collector with the following 
command: 


$ @SYSSSTARTUP: AMDSSSTARTUP RESTART 
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1.4 How Does the Availability Manager Data Analyzer Identify 
Performance Problems? 
When the Data Analyzer detects problems on your system, it uses a combination 
of methods to bring these problems to the attention of the system manager. 
It examines both the types of data collected and how often it is collected and 
analyzed to determine problem areas to be signaled. Performance problems 


are also posted in the Event pane, which is in the lower portion of the System 
Overview window (Figure 1-1). 


The following topics are related to the method of detecting problems and posting 
events: 


e Collecting and analyzing data 


e Posting events 


1.4.1 Collecting and Analyzing Data 
This section explains how the Data Analyzer collects and analyzes data. It also 
defines related terms. 

1.4.1.1 Events and Data Collection 


The data that the Data Analyzer collects is grouped into data collections. 
These collections are composed of related data—for example, CPU data, memory 
data, and so on. Usually, the data items on the tabs (like the ones displayed in 
Figure 1-7) consist of one data collection. 


Figure 1-7 Sample Node Summary 
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An event is a problem or potential problem associated with resource availability. 
Events are associated with various data collections. For example, the CPU 
Process data collection shown in Figure 1-8 is associated with the PRCCUR, 
PRCMWT, and PRCPWT events. (Appendix B describes events, and Appendix C 
describes the events that each type of data collection can signal.) For these events 
to be signalled, you must enable the CPU Process data collection, as described in 
Section 1.4.1.2. 


Users can also customize criteria for events, which is described in Section 1.4.2. 


1.4.1.2 Types of Data Collection 


You can use the Data Analyzer to collect data either as a background activity or 
as a foreground activity. 


Note that for either type of data collection, if you collect data for a specific node, 
only that node is affected. If you collect data for a group, all the nodes in that 
group are affected. 


e Background data collection 


When you enable background collection of a specific type of data collection 
on a specific node, the Data Analyzer collects that data whether or not any 
windows are currently displaying data for that node. 


To enable background data collection, select the check box for a specific type 
of data collection on the Data Collection Customization page (Figure 1-8). 
Note that if the Customize window applies to all OpenVMS nodes, the data 
collection properties that you set are for all nodes. If the window applies to a 
specific node, the properties you set apply only to that node. 


Chapter 7 contains additional instructions for customizing data collection 
properties. 
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Figure 1-8 Data Collection Customization 
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e Foreground data collection 


Foreground data collection occurs automatically when you open any data page 
for a specific node. To open a node data page, double-click a node name in the 
Node pane of the System Overview window (Figure 1-1). The Node Summary 
page is the first page displayed (by default); Figure 1-7 is an example. At the 
top of the page are tabs that you can select to display other data pages for 
that node. 


Foreground data collection for all data types begins automatically when any 
node data page is displayed. Data collection ends when all node data pages 
have been closed. 


Chapter 3 contains instructions for selecting nodes and displaying node data. 


1.4.1.3 Data Collection Intervals 


Data collection intervals, which are displayed on the Data Collection 
customization page (Figure 1-8), specify the frequency of data collection. 
Table 1-3 describes these intervals. 
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Table 1-3 Data Collection Intervals 


Interval (in 
seconds) 


Type of Data 
Collection Description 


NoEvent 


Event 


Display 


Background How often data is collected if no events have been posted for that type 
of data. 


The Data Analyzer starts background data collection at the NoEvent 
interval (for example, every 75 seconds). If no events have been posted 
for that type of data, the Data Analyzer starts a new collection cycle 
every 75 seconds. 


Background How often data is collected if any events have been posted for that type 
of data. 


The Data Analyzer continues background data collection at the Event 
interval until all events for that type of data have been removed from 
the Event pane. Data collection then resumes at the NoEvent interval. 


Foreground How often data is collected when the page for a specific node is open. 


The Data Analyzer starts foreground data collection at the Display 
interval and continues this rate of collection until the display is closed. 
Data collection then resumes as a background activity. 


1.4.2 Posting Events 


The Data Analyzer evaluates each data collection for events. The Data Analyzer 
posts events when data values in a data collection meet or exceed user- 
defined thresholds and occurrences. Values for thresholds and occurrences are 
displayed on Event Customization pages similar to the one shown in Figure 1-9. 
Thresholds and occurrences are described in the next section. 


Figure 1-9 Sample Event Customization 
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1.4.2.1. Thresholds and Occurrences 


Thresholds and occurrences are criteria that the Data Analyzer uses for posting 
events. 


A threshold is a value against which data in a data collection is compared. An 
occurrence is a value that represents the number of consecutive data collections 
that meet or exceed the threshold. 


Both thresholds and occurrences are customizable values that you can adjust 
according to the needs of your system. For details about how to change the values 
for thresholds and occurrences, see Chapter 7. 


Relationship Between Thresholds and Occurrences 

For a particular event, when the data collected meet or exceed the threshold, 
the data collection enters a threshold-exceeded state. When the number of 
consecutive data collections to enter this state meets or exceeds the value in the 
Occurrence box (see Figure 1-9), the Data Analyzer displays (posts) the event in 
the Event pane. 


A closer look at Figure 1-9 shows the relationship between thresholds and 
occurrences. For the DSKERR, high disk device error count event, a threshold 
of 15 errors has been set. A value of 2 in the Occurrence box indicates that the 
number of errors during 2 consecutive data collections must meet or exceed the 
threshold of 15 for the DSKERR event to be posted. 
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Getting Started 


Note 


Before you start this chapter, be sure to read the explanation of data 
collections, events, thresholds, and occurrences, as well as background 
and foreground data collection in Chapter 1. 


This chapter provides the following information: 
e How to configure and start the Availability Manager Data Collector 
e How to start the Availability Manager Data Analyzer 


e How to configure secure communications between the Data Analyzer and 
Data Server 


e How to start the Availability Manager Data Server 
e How to use the main System Overview window 

e How to display basic node data 

e How to get help when you need it 

e How to print a Data Analyzer page 


For information about installing the HP Availability Manager on OpenVMS or 
Windows systems, see the HP Availability Manager Installation Instructions. You 
can access these instructions from the documentation link at the Availability 
Manager web page at the following URL: 


http: //www.hp.com/products/openvms/availabilitymanager 


The installation instructions also include an explanation of how to install and use 
both DECamds and the Availability Manager on the same system. 


2.1 Configure and Start the Data Collector 


Configuration tasks include defining logical names and protecting passwords. 
After you complete these tasks, you can start the Data Collector. The following 
sections describe all of these operations. 


2.1.1 Defining Logical Names 


OpenVMS kits for DECamds Version 7.3-2B and Availability Manager 
Version 3.1 provide a template file that system managers can modify 

to define the logical names used by the Data Collector. You can 

copy the file SYS$MANAGER:AMDS$SYSTARTUP.TEMPLATE to 
SYS$MANAGER:AMDS$SYSTARTUP.COM and edit it to change the default 
logicals that are used to start the Data Collector and to find its configuration 
files. 
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The most common logicals, especially in a mixed-environment cluster 
configuration, are the ones shown in Table 2-1: 


Table 2-1 Common Availability Manager Data Collector Logical Names 


Logical Description 

AMDS$GROUP_NAME Specifies the group that this node will be associated with 
when it is monitored. 

AMDS$DEVICE For nodes with more than one network adapter, allows 
you to specify which adapter the Data Collector should 
use. 

AMDS$RM_DEFAULT_ The number of seconds between multicast “Hello” 

INTERVAL messages from the Data Collector to the Data Analyzer 
node when the Data Collector is inactive or minimally 
active. 


The minimum value is 5. The maximum value is 300. 


AMDS$RM_SECONDARY_ The number of seconds between multicast “Hello” 
INTERVAL messages from the Data Collector to the Data Analyzer 
node when the Data Collector is active. 


The minimum value is 5. The maximum value is 1800. 


Note 


Multicast “Hello” messages are notifications from nodes running the 
Data Collector to the Data Analyzer. This is the way the Data Analyzer 
discovers Data Collectors on the network. 


The Data Collector on a node transmits multicast “Hello” messages for any Data 
Analyzer or Data Server on the extended LAN to receive. The rate at which these 
messages are transmitted is regulated by the settings of the following logicals: 


AMDS$RM_DEFAULT_INTERVAL 
AMDS$RM_SECONDARY_INTERVAL 


These logicals are contained in the file SYSSMANAGER:AMDS$LOGICALS.COM. 
The shorter the time interval, the faster the node is discovered and configured. 


2.1.2 Protecting Passwords 


To change passwords to allow a Data Analyzer to monitor a node, edit the 
following file: 


SYSSMANAGER : AMDSSDRIVER_ACCESS . DAT 


The passwords section of the file is close to the end of the file, after the Password 
documentation section. The passwords in this file correspond to the passwords 
in the Security page shown in Section 7.9.1. Note that you can specify a list of 
passwords in this file. See the comments in the file for details. 
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2.1.3 Starting the Data Collector 


Beginning with OpenVMS Version 7.2, the files needed to run the Data Collector 
on OpenVMS nodes are shipped with the OpenVMS operating system. However, 
if you want the latest Data Collector software, you must install it from the 
Availability Manager Data Collector kit. Once the Data Collector is running on 
a node, you can monitor that node using DECamds or the Availability Manager 
Data Analyzer. 


For the Data Collector to access requests to collect data and to support the Data 
Analyzer, start the Data Collector by entering the following command: 


$ @SYSSSTARTUP:AMDSSSTARTUP START 


To start the Data Collector when the system boots, add the following command to 
the SYS$MANAGER:SYSTARTUP_VMS.COM file: 


S$ @SYSSSTARTUP:AMDSSSTARTUP START 


If you make changes to either the AMDS$DRIVER_ACCESS.DAT or 
AMDS$LOGICALS.COM, restart the driver to load the changes. Enter the 
following command: 


$ @SYSSSTARTUP:AMDSSSTARTUP RESTART 


Note 


You can start the Data Collector on all the nodes in a cluster by using the 
following SYSMAN command: 


$ RUN SYSS$SYSTEM: SYSMAN 

SYSMAN> SET ENVIRONMENT/CLUSTER 

SYSMAN> DO @SYSS$STARTUP:AMDSSSTARTUP START 
SYSMAN> EXIT 

$ 


This method works for any AMDS$STARTUP option. 


2.2 How to start the Data Analyzer 


This section describes what you need to do after the Availability Manager Data 
Analyzer is installed. Starting the Data Analyzer is somewhat different on 
OpenVMS than on Windows systems. However, on both systems, starting the 
Data Analyzer automatically starts the Java™ graphical user interface (GUD), 
which allows you to view information that is collected from Data Collectors 
running on OpenVMS nodes. 


The following sections describe the sequence of steps required to start the Data 
Analyzer on an OpenVMS node and on a Windows node. 


2.2.1 Starting the Data Analyzer on an OpenVMS Node 
To start a Data Analyzer on an OpenVMS node, make sure that: 


e The Data Analyzer is installed on the node from which you want to monitor 
other nodes. 


e The Data Collector is started (see Section 2.1.3). 
Starting the Data Collector accomplishes the following important tasks: 


e Defines the various AMDS$* logicals needed by the Data Analyzer. 
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e Allows the Data Analyzer to communicate with the Data Collector on the 
network. 


To start the Data Analyzer, enter the following command: 
$ AVAIL/ANALYZER 
The Data Analyzer displays the Network Connection dialog box, which is shown 
in Figure 2—20. 
Note 


For a list of qualifiers you can use with the AVAIL/ANALYZER command, 
see the HP Availability Manager Installation Instructions, or enter HELP 
AVAIL at the DCL dollar prompt and then enter the qualifier. 


2.2.2 Starting the Data Analyzer on a Windows Node 


To start the Data Analyzer on a Windows node, first make sure that the 
Availability Manager Windows kit is installed on the node. 


To start the Data Analyzer, follow these steps: 
1. Click 
Start —> Programs. 
2. Select Availability Manager. 
3. Select Data Analyzer Startup. 
The Data Analyzer displays the Network Connection Dialog box, which is shown 
in Figure 2—20. 


2.3 Do You Need to Set Up a Data Server? 


At this point, you must determine whether you need to use a Data Server to 
communicate with the Data Collectors. For an overview of what a Data Server is 
and how it works, see Section 1.2.2. 


If the analyzer system is on the same LAN as the Data Collectors, you can use a 
network adapter on the analyzer system to connect with the Data Collectors. If 
this is the case, you do not need to set up the Data Server. To continue starting 
the Data Analyzer without a Data Server, go to Section 2.6. 


If the Data Analyzer is on a different LAN than the Data Collectors, you must 
set up the Data Server on a server system that is on the same LAN as the Data 
Collectors. To set up secure communication between the Data Analyzer and Data 
Server, see Section 2.4. 


Note 


The Data Collector on an OpenVMS system only allows one Data Analyzer 
or Data Server to use it for communicating with other Data Collectors 
(see section Data Collector Notes under section Section 1.2.1). If you want 
to run both the Data Server and Data Analyzer on the same OpenVMS 
system, HP recommends that you run the Data Server to communicate 
with the other Data Collectors, and then let the Data Analyzer connect 

to the Data Server. This setup is similar to the one shown in Figure 1-4 
and section Requesting and Receiving Information over a WAN under 
section Section 1.2.2. In this case, the Data Analyzer and Data Server 
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are running on the same node (Data Server node), and use an internal IP 
connection for communications. 


2.4 Setting Up Secure Server Communications Between the Data 
Analyzer and Data Server 


Note 


The following terminology is used in the next sections: 
e Data Server refers to the Availability Manager Data Server software. 


e Server system refers to the hardware that runs the Data Server 
software. 


e Analyzer system refers to the hardware that runs the Data Analyzer 
software. 


e Combined kit refers to the kit that includes both the Data Analyzer 
and the Data Server kit. 


Note the following: 


e The server system and analyzer system can be either an OpenVMS 
system or a Windows system. 


e Any analyzer system can connect to any server system. The operating 
system and hardware platform make no difference to the operation of 
the Availability Manager. 


To collect data over a WAN, the Data Analyzer communicates with a Data Server. 
The Data Server is a Java-based program that runs on OpenVMS or Windows. 
Except for differences in starting the Data Server on OpenVMS and Windows, the 
following section applies to both operating systems. 


The Availability Manager uses an encrypted connection for secure communication 
between the Data Analyzer and the Data Server. The following sections describe 


how to set up the Data Analyzer and Data Server to use a secure communication 
link. 


2.4.1 Introduction to Secure Communications 


The Availability Manager uses Transport Layer Security (TLS) Version 1 for 
secure communication between the Data Analyzer and the Data Server. TLS is 
an extension of Secure Sockets Layer (SSL) Version 3.0, which is the most widely 
used protocol for security on the web. 


TLS uses public key cryptography (also called asymmetric cryptography) to 
guarantee secure communication over a network. This type of cryptography uses 
an encryption algorithm that produces a pair of keys: 


e A public key provides authentication, and is made public to any interested 
party as a trusted certificate. 


e A private key that works with trusted certificates to provide privacy and data 
integrity 
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What one key encrypts, only the other key can decrypt. Together, these two keys 
are known as an asymmetric key pair. 


Key Pairs, Key Stores, and Trust Stores 

Before you can use the Data Server, you must create an asymmetric key pair. 
This key pair is associated with the Data Server, and is used by the Data Server 
and Data Analyzer to establish an encrypted communication link between them. 


The Data Server stores the public and private key associated with it in a key 
store . The Data Server key store is the file AM$KeyStore.jks and resides 

on the server system. On OpenVMS systems, this file is in the AMDS$AM_ 
CONFIG: directory. On Windows systems, the key store is in the installation 
folder. Currently, HP supports configurations in which the Data Server has only 
one key pair in a key store. 


The Data Server public key is also stored by the Data Analyzer in a trust 
store on the analyzer system. The Data Analyzer trust store is the file 
AM$TrustStore.jks. On OpenVMS systems, this file is in the AMDS$AM_ 
CONFIG: directory. On Windows systems, the trust store is in the installation 
folder. A trust store for a particular Data Analyzer holds the public key for each 
Data Server with which it communicates. 


You create and store the key pair after installing either the combined kit (for 
OpenVMS) or the Availability Manager kit (for Windows). The next sections 
describe how to perform the following tasks: 


e Creating the key pair from either the server or analyzer system 
e Store the key pair in a key store on a server system 


e Store the public key in a trust store on an analyzer system 


2.4.2 Methods of Setting Up Secure Communications 


The key store and trust store are created and maintained by dialog boxes in the 
Data Analyzer. The Data Analyzer is used for key management because it is the 
part of the Availability Manager that uses a GUI interface. By using the GUI 
interface, keys are managed the same way on OpenVMS and Windows platforms. 
This also keeps the Data Server from having the overhead of the dialog boxes 
used for creating and maintaining key and trust stores. 


There are two basic methods of setting up secure communications. Both methods 
create a key store for the Data Server and a trust store for the Data Analyzer. 
The difference is that one creates the key store using the server system, and the 
other creates the key store from the analyzer system. Using one method or the 
other is sufficient to set up secure communications between the Data Analyzer 
and Data Server. 


2.4.2.1 Setup Using the Server System 
Creating the key store from the server system is the simplest method. You create 
the key store and export the public key using the Data Analyzer on the server 
system, copy the public key to the analyzer system, and import the public key 
with the Data Analyzer on the analyzer system. For a description of this method, 
see Section 2.4.3. 


Using this method assumes that you can use the Data Analyzer’s GUI interface 
on the server system. You can start the Data Analyzer on the server system and 
display the GUI on the following: 


e the server graphics console 
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another OpenVMS system that does have a graphics console 
a Windows system that has software to accept and display an X Windows GUI 


If this is not possible, use the alternate method to create and maintain key stores 
in Section 2.4.2.2. 


2.4.2.2 Setup Using the Analyzer System 
With this method, you create the key store and export the public key using the 
Data Analyzer on the analyzer system, and copy the key store to the server 
system. This method is described in Section 2.4.4. 


2.4.3 Steps for Setting Up Secure Communications from the Server System 


The following section describes how to set up the Data Server from the server 
system. It also describes the key setup for the Data Analyzer that runs on the 
server system. The procedure involves the following tasks: 


Creating the key pair for the Data Server, including the option of generating 
and storing the trust store for the Data Analyzer on the server system, 


Storing the key pair in the Data Server’s key store on the server system 


Storing the public key for another Data Analyzer to use 


When you complete these steps, the Data Server can accept connections from any 
Data Analyzer on the server system or on other systems. 


2.4.3.1 Creating the Key Pair for the Data Server 


1. 


Start the Data Analyzer on the server system according to the instructions 
in Section 2.2. When the Data Analyzer starts, it displays the Network 
Connection dialog box as shown in Figure 2-1. 


Figure 2-1 Network Connection Dialog Box 


Network Connection 


Server Analyzer Key Stores 


HP Availability Manager 


Please select network adapters and/or Data Server to use for this session 


\DEVICE\{9B 19B88E-9E99-49DF-97E6-9029 1BOAEB2F} 


VMware Accelerated AMD PCNet Adapter - Teefer2 Miniport 
\DEVICE\{4E677B32-B87D-401 1-AD40-6B2E05E4D9DB} 


VMware Accelerated AMD PCNet Adapter #2 - Teefer2 Miniport 


fill Server: |localhost | Port: |9819 


Trust Store... 


From the Server menu, select Key Store... to open the default key store for 
this system. 
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The Availability Manager displays the Key Store Management dialog box as 
shown in Figure 2-2. 


Figure 2-2 Key Store Management Dialog Box 


Key Store Management 


Key Store Keys 


Default Key Store 
Alias Entry Type | X.500 Distinguished Name 


Delete Export... | 


Status 


Default Key Store loaded. 
There are no entries in the Key Store. 


Cancel 


3. In the Key Store Management dialog box, click New Key... to display the 
Generate New Key Pair dialog box as shown in Figure 2-3. 


Figure 2-3 Generate New Key Pair Dialog Box 


Generate New Key Pair 


Key algorithm: |DSA& x) 


%.500 Distinguished Name fields 


Server Name (CN): |My Server 


Organizational Unit (OU): My IT Group 


Organization (0): My Company 
City, or Locality (L): |My City 
State, or Province (ST): My State 


Country code (2 letter) (C): |My Country Code 


Validity: 90) days. 


|v] Import the new key into the Default Trust Store. 


Alias: my server Cancel 


To create a new key pair, fill in the fields in this dialog box. 


The information you enter in the Generate New Key Pair dialog box includes 
fields that pertains to an X.500 Distinguished Name. HP recommends that 
you enter the name of the server system in the Server Name field (CN) and 
in Alias field. ("Alias" is simply a name that is used to track items in the key 
store and is not part of the generated key.) 
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Currently, the Availability Manager does not verify whether or not a key has 
expired. Therefore, the Validity field is not used. However, for the field to 
work in future versions, HP recommends that you enter a large value if you 
are creating a key that must be valid for a long time. 


To run the Data Analyzer on the server system and have it connect to the 
Data Server on the server system, check the Default Trust Store check box. 
This creates a trust store for the Data Analyzer that contains the public key 
for accessing the Data Server on the server system. 

When you finish entering information to create a new key pair for the Data 
Server, click Add (it might take a few seconds to create the key). If you 
checked the Default Trust Store check box, the default trust store for this key 
pair is created for the Data Analyzer running on the server system. 

The Key Store Management dialog box shown in Figure 2—4 now displays one 
key pair, reflecting the information you entered in the Generate New Key Pair 
dialog box. 


Figure 2-4 Key Store Management Dialog Box Showing Key Pair 


Key Store Management 


Key Store Keys 


Default Key Store 


| Alias ] re _Entry Type a3 an me X.500 Distinguished Name _ 
|my_server Private Key CN=My_Server,OU=My_IT_Group,O=My_Company,L=My_City, ST=My} 


Delete Export... 


Status 


my_server added. 


If the only system you want to run the Data Analyzer is the server system, 
then do the following: 


1. Click on OK in the Key Store Management dialog box to save the key 
store on the server system. 


2. Follow the instructions in Section 2.6 to start and configure the Data 
Analyzer. 


To run the Data Analyzer on other systems, see Section 2.4.3.2. 


2.4.3.2 Export the Public Key for Other Data Analyzers 
To run the Data Analyzer on other systems, and to connect to the Data Server 
on this system, you must export the public key for the Data Server as a trusted 
certificate. To do this, click the key pair name in the Key Store Management 
dialog box. This action enables the Export... button. Click Export... to export the 
public key in a trusted certificate. The Availability Manager displays the Export 
Certificate dialog box as shown in Figure 2-5. 
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Figure 2-5 Export Certificate Dialog Box 


Export Certificate 


Look In: |] HP Availability Manager V3.0 


ire 


FileName: —|my_server.cer| 


Files of Type: (Key Certificates *.cer 


Cancel 


Store the trusted certificate in the folder and file name of your choice. Any file 
name with a CER extension works, although naming the file the same as the 
server alias can make it easier to identify. Click Export to complete this process. 


Important 


Remember the location of this certificate. This certificate is used in 
Section 2.4.5. 


2.4.3.3 Save the Key Store 


To save the key store on the server system, click OK in the Key Store 
Management dialog box. Then see Section 2.4.5 to import the trusted certificate 
into the Data Analyzer trust store. 


2.4.4 Steps for Setting Up Secure Communications from the Analyzer System 


The process for setting up the Data Server from an analyzer system involves the 
following tasks: 


e Creating the key store for the Data Server on the server system. 

e Exporting the public key as a trusted certificate for other analyzer systems. 
e Saving the key store. 

e Copying the key store to the server system. 

e Delete the key and trust store from the analyzer system. 


e Exporting the public key to the server system from an existing server system 
using an analyzer system. 
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2.4.4.1 Creating the Key Store for the Data Server 
Start the Data Analyzer on the analyzer system. When the Data Analyzer starts, 
it displays the Network Connection dialog box as shown in Figure 2-6. 


Figure 2-6 Network Connection Dialog Box 


Network Connection 


Server Analyzer Key Stores 


HP Availability Manager 


Please select network adapters and/or Data Server to use for this session 


\DEVICE\{9B 19B88E-9E99-49DF-97E6-9029 1B0AEB2F} 
VMware Accelerated AMD PCNet Adapter - Teefer2 Miniport 


\DEVICE\{4E677B32-B87D-4011-AD40-6B2E05E4D9DB} 
VMware Accelerated AMD PCNet Adapter #2 - Teefer2 Miniport 


fil Server: {localhost | Port: jga19 


Trust Store... 


From the Key Stores menu, click New Trust or Key Store.... The Availability 
Manager displays the Key Store Management dialog box, shown in Figure 2-7. 


Figure 2-7 Key Store Management Dialog Box 


Key Store Management 


Key Store Keys 


New Key or Trust Store 
Alias | _EntryType _| X.500 Distinguished Name 


| 


Export... New Key... 


The Key Store does not exist. It will be created the first time the Key Store is saved. 


f 
| Delete 


Status 


Cancel 


In the Key Store Management dialog box, click New Key... to display the Generate 
New Key Pair dialog box as shown in Figure 2-8. To create a new key pair, fill in 
the fields in this dialog box. For a description of these fields, see Section 2.4.3.1. 
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Figure 2-8 Generate New Key Pair Dialog Box 


Generate New Key Pair 


Key algorithm: [DSA | ~ 


X.500 Distinguished Name fields 


Server Name (CN): |My Server Name 


Organizational Unit (OU): |My IT Group 


Organization (O): My Company 
City, or Locality (L): |My city 
State, or Province (ST): |My State 


Country code (2 letter) (C): My Country Code 


Validity. 90) days. 


Import the new key into the Default Trust Store. 


Alias: fmwy server name Cancel 


When you finish entering information in the Generate New Key Pair dialog box, 
click Add (it might take a few seconds to create the key). If you checked the 
Default Trust Store check box, the default Trust Store for this key pair is created 
for the Data Analyzer running on the this analyzer system. 


The Key Store Management dialog box (Figure 2-9) now displays the new key 
pair, reflecting the information you entered. 


Figure 2-9 Key Store Management Dialog Box with One Entry 


Key Store Management 


Key Store Keys 


New Key or Trust Store 


“Alias Entry Type __X.500 Distinguished Name 
my_server_name Private Key CN=My_Server_Name,OU=My_IT_Group,O=My_Company,L=My_| 


Delete Export... | 


Status 


my_server_name added. 


This step finishes the setup needed for this analyzer system. If this is the only 
Data Analyzer that needs to connect to this Data Server, go to Section 2.4.4.4. 
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2.4.4.2 Exporting the Public Key for Analyzer Systems 


For other Data Analyzers that need to connect to the Data Server, export the 
public key as described in this section. 


In the Key Store Management dialog box, select the Data Server key pair by 
clicking the key entry. This enables the Export... button in the dialog box. Click 
Export... to extract the Data Server’s public key and store it in a file as a trusted 
certificate. 


The Export Certificate dialog box is displayed as shown in Figure 2-10. 


Figure 2-10 Export Certificate Dialog Box 


Export Certificate 


CS HP availability Manager V3.0 


FileName: — |my_server_name.ced 


f 
Files of Type: (Key Certificates *.cer 


Export ]| Cancel | 


Store the trusted certificate in the folder and file name of your choice. Any file 
name with the CER extension works, although accepting the default can make 
the file easier to identify. Click on the Export button to complete this process. 


Important 


Remember the location of this certificate. This certificate is used in 
Section 2.4.5. 


2.4.4.3 Saving the Key Store for the Server System 
Now that you have created the key pair for the Data Server, you must save the 
pair in a key store. In the Key Store Management dialog box, select the Key 
Store menu, and then select Save. This displays the Save Key Store dialog box as 
shown in Figure 2-11. 
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Figure 2-11 Save Key Store Dialog Box 


Save Key Store 


Look In: |] HP Availability Manager V3.0 


ire 
te AM$TrustStore.jks 


FileName: | 


Files of Type: (Key, or Trust stores *.jks 


Save | Cancel | 


Note 


If you checked the Default Trust Store check box in Figure 2-8, the file 
AM$TrustStore.jks appears. 


Save the key store in the folder and file name of your choice. Any file name with 
a JKS extension works, although naming the file the same as the server alias can 
make the file easier to identify. Enter this file name in the File Name: field, and 
click Save to save the key store. In the Key Store Management dialog box, click 
Cancel to dismiss the dialog box. 


2.4.4.4 Copying the Key Store to the Server System 
The key store is now ready for the server system. Copy the file to the server 
system. If you use FTP to transfer the file, be sure to use the binary transfer 
mode. 


Once the file is copied, move it to the location and file name that the Data Server 
looks for when it starts. On OpenVMS, the location is AMDS$AM_CONFIG: 
directory. On Windows, the location is the installation directory. Make sure that 
the file is named AM$KeyStore.jks. 


2.4.4.5 Delete the key and trust store from the analyzer system 
Once you have created the key store and copied it to the server system, it is 
recommended that you delete the key and trust store on the analyzer system. 
This sets up the analyzer system to create a key store for another Data Server, 
or to create the trust store by importing the trusted certificates from each Data 
Server into the Data Analyzer. 


This concludes the Data Server setup on the server system. If you want to 
create a key store for another Data Server, go to Section 2.4.4. Otherwise, go to 
Section 2.4.5, which describes how to import the Data Server’s public key into the 
trust store of other Data Analyzers. 


The next section describes how to obtain the public key from an existing Data 
Server. This step allows the Data Analyzer to connect to the Data Server. 
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2.4.4.6 Obtaining the Public Key from an Existing Data Server 


This section describes how to obtain a Data Server’s public key from the analyzer 
system. 


2.4.4.6.1 Copy the Key Store from the Server System Copy the key store 
from the server system to a place that is accessible to the analyzer system. 

On OpenVMS, the key store is AMDS$AM_CONFIG:AM$KEYSTORE.JKS. 

On Windows, it is AM$KeyStore.jks in the Availability Manager installation 
directory. If you use FTP, be sure to use the binary mode to transfer the key store 
successfully. 


2.4.4.6.2 Export the Key Store Public Key to a Trusted Certificate This step 
extracts the Data Server public key from the key store by exporting it to a trusted 
certificate. 


Start the Data Analyzer on the analyzer system. When the Availability Manager 
starts, it displays the Network Connection dialog box as shown in Figure 2-12. 


Figure 2-12 Network Connection Dialog Box 


Network Connection 
Server Analyzer Key Stores 


HP Availability Manager 


Please select network adapters and/or Data Server to use for this session 


\DEVICE\{9B 19B88E-9E99-49DF-97E6-9029 1B0AEB2F} 
VMware Accelerated AMD PCNet Adapter - Teefer2 Miniport 


\DEVICE\{4E677B32-B87D-4011-AD40-6B2E05E4D9DB} 
VMware Accelerated AMD PCNet Adapter #2 - Teefer2 Miniport 


il Server: {localhost | Port: [9819 


Trust Store... 


From the Key Stores menu, select Open Trust or Key Store... to open the Open 
Key or Trust Store dialog box as shown in Figure 2-13. 
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Figure 2-13 Open Key or Trust Store Dialog Box 


Open Key or Trust Store 


Look In: |] Key Stores 


gee] AM$KeyStore.jks 


File Name: 


Files of Type: |Key, or Trust stores *jks 


Cancel 


In this dialog box, locate the key store file by selecting the name of the key 
store file, and clicking Open. The opened key store is displayed in the Key Store 
Management dialog box as shown in Figure 2-14. 


Figure 2-14 Key Store Management Dialog Box 


Key Store Management 


Key Store Keys 


Default Key Store 


Alias | Entry Type X.500 Distinguished Name 
my_server Private Key CN=My_Server,OU=My_IT_Group,O=My_Company,L=My_City,ST=My! 


Export... 


Status 


Default Key Store loaded. 
There is one entry in the Key Store. 


Cancel 


Select the key pair entry in the dialog box. This enables the Export... button. 
Click Export... to export the public key of the key pair into a trusted certificate. 
The Availability Manager displays the Export Certificate dialog box as shown in 
Figure 2-15. 
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Figure 2-15 Export Certificate Dialog Box 


Export Certificate 


Look In: |] Key Stores 


File Name: my_server.cer 


Files of Type: |Key Certificates *.cer | 7) 


Export || Cancel ] 


Store the trusted certificate in the folder and file name of your choice. Any file 
with the CER extension works, although accepting the default can make the 
file easier to identify. Click Export to complete this process. You now have the 
trusted certificate. 


Important 


Remember the location of this certificate. This certificate is used in 
Section 2.4.5. 


2.4.5 Key Setup for a Data Analyzer to Connect to an Existing Data Server 


This section describes how to set up a trust store for a Data Analyzer to connect 
to an existing Data Server. The steps involve the following tasks: 


e Obtaining the Data Server’s public key from its key store as a trusted 
certificate. 


e Copying the trusted certificate to the analyzer system. 
e Importing the trusted certificate into the Data Analyzer’s trust store. 


2.4.5.1 Obtaining the Data Server Public Key 


First enter the Data Server’s public key into the trust store of the Data Analyzer. 
This transfer involves exporting the key into a trusted certificate from the key 
store, and importing the key into the Data Analyzer’s trust store. 


The following sections describe how to export the the public key into a trusted 


certificate. If you need to export the public key, determine which of the following 
applies to you. 


e Section 2.4.3.2, Export the Public Key for Other Data Analyzers 
e Section 2.4.4.2, Exporting the Public Key for Analyzer Systems 
e Section 2.4.4.6.2, Export the Key Store Public Key to a Trusted Certificate 


Make sure you have the Data Server’s public key in a trusted certificate for the 
next step. 
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2.4.5.2 Copying the Trusted Certificate 
Copy the trusted certificate from the server system to the analyzer system. Note 
that the trusted certificate contains binary data, so you must use binary mode if 
FTP is the file transport. The certificate is now ready for importing to the Data 
Analyzer’s trust store. 


2.4.5.3 Importing the Data Server Public Key 
Start the Data Analyzer on the analyzer system. From the Analyzer menu, select 
Trust Store to open the default trust store for this system. The Availability 
Manager displays the Trust Store Management dialog box as shown in 
Figure 2-16. 


Figure 2-16 Trust Store Management Dialog Box 


Trust Store Management 


Trust Store Keys 


Default Trust Store 


Alias |[__EntyTye [| = —~C~C~“C~CSCSCSCS Sis tinguished Name 


Delete | 
Status 


Default Trust Store loaded. 
There are no entries in the Trust Store. 


Cancel 


Click Import... to import the trusted certificate. The Availability Manager 
displays the Import Certificate dialog box as shown in Figure 2-17. 


Figure 2-17 Import Certificate Dialog Box 


Import Certificate 


Look In: |] HP Availability Manager V3.0 


Cire 
my_server.cer 


File Name: 


Files of Type: Key Certificates *.cer 
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Select the name of the trusted certificate, and click Import. The Availability 
Manager displays the Assign Alias for Certificate dialog box as shown in 
Figure 2-18. 


Figure 2-18 Assign Alias for Certificate Dialog Box 


Assign Alias for Certificate 


Certificate imported from R:iProgram Files\Hewlett-Packard\HP Availability Manager V3.0\my_server.cer 


Version: Vl 
Subject: CN=My_ Server, OU=My_IT_Group, O=My_Company, L=My_ City, 5T=My_State, C=My_Country Code 
Signature Algorithm: SHAlwithDSa, OID = 1.2.840.10040.4.3 


Key: Sun DSA Public Key 
Parameters: DS5a 
p: £A7ESSS81 1d751229 S2df4a9c Zeecede? F611b752 3cef4400 c3le3f60 b6512669 
455d4022 S51fb593d 8d58fabf cSf£5ba30 f6cb9b55 6cd7813b 801d346f £26660b7 
6b9950a5 ad9f9fes O47b1022 c24fbbaS d7?feb?c6 lb£S63b57 e?cbasa6 150L04fb 
83f6d3c5 lec30235 54135al6 9132f675 f3ae2b6l d?2aeff2 2203199d dl4801c7 
q: 9760S08£ 15230bec b292b982 aZeb840b £058lcf5 
gq: f£7ela085 d69b3dde cbbcabSc 36b857b9 7994afbb fa3aeas82 £9574c0b 304078267 
5159578e bad4594f e6710710 8160b449 167123e8 4c261613 b?cf0932 SccBabel 
3cl67a8b S47c8d28 eQa3aele 2bb3a675 9l6ea37£ Obfa2135 62£1fb62 7a01243b 
ccoadflbe a8519089 a883dfel SaeS9f06 928b66Se 807b5525 640l4c3b fecf492a 


Assign Alias: my_server| Cancel 


This dialog box displays the trusted certificate. Enter the alias name for the 
certificate in the Assign Alias field. Although you can put any text in this field, 
it is best to choose the same alias name that the Data Server uses. Then click 
OK to continue. The Availability Manager displays the Trust Store Management 
dialog box with the imported key as shown in Figure 2-19. 


Figure 2-19 Trust Store Management Dialog Box 


Trust Store Management 


Trust Store Keys 


Default Trust Store 


Alias Entry Type X.500 Distinguished Name 
my_server Trusted Certificate CN=My_Server,OU=My_IT_Group,O=My_Company,L=My_City,ST=My! 


Status 


"my_server’ imported successfully. 


In the Trust Store Management dialog box, click OK to save the trusted certificate 
in the Data Analyzer trust store. 
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This sets up the Data Analyzer to connect to a Data Server. The Data Analyzer 
supports connections to multiple Data Servers. To connect to multiple Data 
Servers, export the public key for each Data Server and import it into the Data 
Analyzer. 


This completes the Data Analyzer key configuration. You are now ready to run 
the Data Analyzer and connect to the Data Server. 


2.5 Starting the Data Server 


This section describes tasks you must perform after the Availability Manager 
Data Server is installed. Starting the Data Server is somewhat different on 
OpenVMS than on Windows systems. However, on both systems, the Data Server 
listens for connections from Data Analyzers once it is started. 


The Data Server is designed to run in a minimal environment. It only outputs 
text messages to log various events and Data Analyzer connections. Because of 
this design, it can be run in a batch job on OpenVMS, or as a startup task on 
Windows. 


The following sections contain the sequence of steps required to start the Data 
Server on an OpenVMS node and a Windows node. 


The first step is to decide which platform is to run the Data Server: Windows or 
OpenVMS. 


2.5.1 Starting the Data Server on an OpenVMS System 


To start a Data Server on an OpenVMS System (Alpha or 164), make sure the 
following conditions are met: 


e The Data Server is installed on a node that is on the same LAN as your 
OpenVMS systems. 


e The Data Collector is started (see Section 2.1.3). 

Starting the Data Collector is important for these reasons: 

e Defines the various AMDS$* logicals needed by the Data Server. 

e Allows the Data Server to communicate to the Data Collector on the network. 


After you install and configure the Data Collector and Data Server and start the 
Data Collector, enter the following command to start the Data Server: 


$ AVAIL/SERVER 


Note 


For a list of qualifiers you can use with the AVAIL/SERVER command, 
see the HP Availability Manager Installation Instructions, or enter HELP 
AVAIL and then the qualifier name at the DCL dollar prompt. 


2.5.2 Starting the Data Server on Windows 


To install and configure the Availability Manager, follow the steps in the 

HP Availability Manager Installation Instructions. Then, to start the Data 
Server, click Click Start -> Programs -> HP Availability Manager -> Data Server 
Startup. 
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2.5.3 Data Server Port and Firewalls 


If you are running a firewall on your server system, ensure that the firewall 
allows communication over the port the Data Server uses. The default port 
number is 9819, and the type of connection for the port is TCP. 


2.6 Using the Network Connection Dialog Box to Start Collecting 
Data 


The following section describes the steps needed to get the Data Analyzer to 
connect to one or more network adapters, or connect to one or more Data Servers. 
The Data Analyzer supports any combination of available network adapters and 
Data Servers. 


These steps assume that the Data Servers are already running on the server 
systems. 


Start the Data Analyzer on the analyzer system as described in Section 2.2. 
The Availability Manager displays the Network Connection dialog box, shown in 
Figure 2-20. 


Figure 2-20 Network Connection Dialog Box 


Network Connection 
Server Analyzer Key Stores 


HP Availability Manager 
Please select network adapters and/or Data Server to use for this session 


\DEVICE\{9B 19B88E-9E99-49DF-97E6-9029 1BOAEB2F} 
VMware Accelerated AMD PCNet Adapter - Teefer2 Miniport 


\DEVICE\{4E677B32-B87D-401 1-AD40-6B2E05E4D9DB} 
VMware Accelerated AMD PCNet Adapter #2 - Teefer2 Miniport 


fill Server: localhost | Port: |9819 


Trust Store... 


Figure 2-20 shows two entries for the two network adapters on this particular 
system. The last entry is where you enter the IP address and port number of a 
Data Server. To use one or more of these network adapters, check the check box 
to the left of each network adapter, and click OK. The Data Analyzer starts, using 
the network adapters you have chosen. To start using the Data Analyzer, see the 
instructions in Section 2.8. 


To connect to one or more Data Servers, enter the IP address of each server, 
along with the IP port that the Data Server uses for communication. There are a 
number of possible forms for the IP address: 


e Alphanumeric IP address - Alphal.denver.newscorp.com 
e Numeric IP address - 136.132.15.32 
e WINS entry for a Windows system - WXPSRV1 


e Analyzer system name synonym - Localhost 
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The default IP address shown in the dialog box is "localhost". Localhost is a 
synonym for the IP address of the Analyzer system itself. Use the "localhost" 
default or enter the IP address of the Data Server, the IP port the Data Server 
is using in the Port: field, and click on the plus sign button to register the entry. 
The data for the new Data Server entry is displayed in the dialog box. You can 
repeat this process to enter all the Data Servers you want to use. 


Note 


You can use the "localhost" name to allow more than one Data Analyzer 

instance to access data from a particular network adapter on the system. 
See Figure 1-4 for a figure that is similar to the following example that 

illustrates how this is done. 


For instance, Data Server node ACCPNT is connected to Data Collector 
nodes Edmund and Lucy through network adapter A on ACCPNT. If you 
start the Data Analyzer on ACCPNT and have it use adapter A to gather 
data, this instance of the Data Analyzer is the only instance that can use 
adapter A to access Edmund and Lucy. If you want more than one Data 
Analyzer to access Edmund and Lucy through node ACCPNT, then use 
the Data Server instead. Start the Data Server on ACCPNT and have 

it use adapter A. Then you can start the Data Analyzer on ACCPNT, 
use the "localhost" name to access the Data Server running on ACCPNT, 
and gather data from Edmund and Lucy. Another person using the Data 
Analyzer on a Data Analyzer node can also gather data from Edmund and 
Lucy from ACCPNT by connecting to the Data Server on ACCPNT. 


Using the Data Server in this manner allows you to run the Data 
Analyzer on a Data Server node without restricting access to its network 
adapters. 


Figure 2-21 shows an example of this procedure. The IP address entered is 
Aslan, the WINS entry for the Data Server system, and the port number entered 
is 9819. 


Figure 2-21 Network Connection Dialog Box with One Data Server Entry 
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Figure 2—22 shows the result of adding a second Data Server using the numeric 
form of the IP address. 


Figure 2-22 Network Connection Dialog Box with Two Data Server Entries 


Network Connection 


Server Analyzer Key Stores 
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Please select network adapters and/or Data Server to use for this session 


\DEVICE\{9B 19B88E-9E99-49DF-97E6-9029 1B0AEB2F} 
VMware Accelerated AMD PCNet Adapter - Teefer2 Miniport 


iDEVICE\{4E677B32-B87D-4011-AD40-6B2E05E4D9DB} 
VMware Accelerated AMD PCNet Adapter #2 - Teefer2 Miniport 


Server: Aslan Port: 9819 
fii] Server: 16.212.8.229 Port: 9819 
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Figure 2—23 shows the result of adding a third Data Server using the 
alphanumeric form of the IP address. 


Figure 2-23 Network Connection Dialog Box with Three Data Server Entries 


Network Connection 
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Please select network adapters and/or Data Server to use for this session 


iDEVICE\{9B 19B88E-9E99-49DF-97E6-9029 1BOAEB2F} 
VMware Accelerated AMD PCNet Adapter - Teefer2 Miniport 


\DEVICE\{4E677B32-B87D-4011-AD40-6B2E05E4D9DB} 
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Server: Aslan Port: 9819 
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Server: {localhost | Port: 9819 


Trust Store... 


To remove a Data Server entry from the Network Connection dialog box, click the 
delete button (X) to the right side of the Data Server entry. 


To start collecting data, check the network adapter and Data Server entries you 
want to use, and click OK. This process is described in Section 2.7. 


2.6.1 Additional Information About Key Stores 


This section contains some additional information about handling keys, key stores 
and trust stores. 
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2.6.1.1 Clarification of Network Connection dialog box Menus 
Note the following: 


e The Key Store menu item on the Server and the Key Stores menu open the 
default Data Server key store (AM$KeyStore.jks). This default key store 
name is what the Data Server uses when it starts. You can save key stores 
with other file names, but when you copy the key store to the server system 
for the Data Server to use, you must rename it to the default key store name. 


e The Trust Store menu item on the Analyzer and Key Stores menus 
and the Trust Store button open the default Data Analyzer trust store 
(AM$TrustStore.jks). This default trust store name is what the Data Analyzer 
uses when it starts. You can save trust stores with other file names, but when 
you copy the trust store to the analyzer system for the Data Analyzer to use, 
you must rename it to the default trust store name. 


e The other menu items on the Key Stores menu open generic key or trust 
stores that you are prompted to name when you open or save any of them. 


2.6.1.2 Export and Import Made Easy 
The Availability Manager allows you to open multiple key and trust stores using 
the menus on the Network Connection dialog box. The Key Store and Trust Store 
Management dialog boxes allow you to drag and drop items interchangeably 
between dialog boxes (and to the file system or desktop on Windows). This 
operation can make import and export easier if you open the key and trust stores 
locally or if you use network shares to open them. 


2.6.1.3 Certificates 
The certificate that you create is a “self-signed” one. This means that the person 
who creates the certificate also signs off on its legitimacy. This type of certificate 
is also called a root certificate. 


2.7 Choosing Network Connections for Collecting Data 


When you start the Data Analyzer, it displays the Network Connection dialog box. 
This dialog box shows the available network adapters on the system, and any 
Data Servers that have been entered. You can choose which networks adapters 
and Data Servers the Data Analyzer uses for collecting data by check the check 
box of each entry. 


Figure 2-24 shows a Network Connection dialog box with the two available 
network adapters on the system, and three Data Servers. Three of the entries 
are checked. Section 2.8 uses this example to document how to use the Data 
Analyzer. 
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Figure 2-24 Sample Network Connection Dialog Box with Three Checked 
Entries 


Network Connection 
Server Analyzer Key Stores 


HP Availability Manager 


Please select network adapters and/or Data Server to use for this session 


\DEVICE\{9B 19B88E-9E99-49DF-97E6-9029 1B0AEB2F} 
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Trust Store... 


2.8 Using the System Overview Window 


After you click OK on the Network Connection dialog box, the Data Analyzer 
displays the System Overview window Figure 2—25 and monitors the network 
for multicast “Hello” messages from nodes running the Data Collector. It follows 
these steps: 


1. After receiving a multicast “Hello” message from the Data Collector, the 
Data Analyzer attempts to connect to a node. This is called the attempting 
collection state. 


The Data Analyzer notifies you of this and other states in the System 
Overview window, which is shown in Figure 2-25. 


2. The Data Collector performs a security check on the Data Analyzer connection 
attempt. 


e Ifthe Data Analyzer passes the security check while the Availability 
Manager is attempting the connection, the connection succeeds, and data 
collection starts. This is called the data collection state. 


e Ifthe Data Analyzer fails the security check, the node is in the 
connection failed state. 


3. While the Data Analyzer collects data, if a node goes down, or a network 
connection fails between the graphical user interface and the node, that node 
is placed in the path lost state. 


The colors of the icons preceding each node name in Figure 2—25 indicate the 
state of the node. 
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Figure 2-25 System Overview Window 
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Group [DECAMDS] has 4 nodes 


5363K/65088K 


The color code of each node state is explained in Table 2-2. 


Table 2-2 Explanation of Color Codes in the System Overview Window 


Color 


Description 


Brown 


Yellow 


Black 


Red 


Green 


Attempts to configure nodes have failed—for example, because the nodes are 
in a connection failed state. A tooltip, which is described in Section 2.8.2.1, 
explains the reason for the failure. 


Nodes are in the attempting collection state; that is, the security check of the 
nodes is in progress. Nodes that remain in this state more than several seconds 
indicate network connectivity problems with the Data Analyzer. 


Nodes are in a path lost state; that is, the network path to the node has been 
lost or the node is not running. 


Nodes are in the data collection state—that is, they are collecting data—but the 
nodes have exceeded a threshold, causing events to be posted. Note that if an 
event causes the output of any message besides an informational one, a node is 
displayed in red. 


Nodes are in the data collection state; that is, the security check was successful, 
and the nodes are collecting data. 


The System Overview window is divided into two segments, or panes: the 
Group/Node pane and the Event pane. 
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2.8.1 Using the Group/Node Pane 


When you start the Data Analyzer, the System Overview window (see 

Figure 2-25), displays information on connection lines at the top of the pane 
(that is, lines starting with “Device”, “Aslan” and “16.212.8.229” in Figure 2—25). 
The items on these lines measure throughput and congestion on each connection. 
The following table describes the column headings. 


Heading 


Description 


MEM 


PFLTS 


PFW/COM 


BIO 


DIO 


These numbers monitor the memory statistics of Data Server. The first 
number is the amount of memory used. The second number is the total 
memory available. The colored bar represents the percentage of memory 
used. A blue bar is used for values up to 60%, yellow up to 80%, and red up 
to 100%. 


These numbers are the number of Data Analyzers connected to the Data 
Server, and measure the delay from when a packet is queued from the Data 
Server LAN connection to when it is sent to the Data Analyzer. The delay is 
measured in milliseconds. The data is in the form C - X/A/N where 


C - Connection count 
X - Maximum delay 
A - Average delay 

N - Minimum delay 


The colored bar represents the average delay, with the maximum set at 
500ms. A blue bar is used for values up to 250ms, yellow up to 400ms, and 
red for 400ms. 


These numbers measure the delay from when a packet is queued in the Data 
Analyzer to when it is written to the Data Server. The delay is measured in 
milliseconds. The data is in the form X/A/N where 


X - Maximum delay 
A - Average delay 
N - Minimum delay 


The colored bar represents the average delay, with the maximum set at 
500ms. A blue bar is used for values up to 250ms, yellow up to 400ms, and 
red for 400ms. 


These numbers monitor the packets that have been read using this 
connection, including multicast “Hello” messages for nodes that are not 
being monitored. The first number is the number of packets per second. 
The second number is the number of bytes per second. Note that for wide 
area network connection, this does not include any overhead that TCP/IP 
introduces when transmitting the data. The blue bar represents the number 
of packets read in the last monitoring interval. A full bar represents 50 or 
more packets per second. 


The first (or only) number is the number of packets currently waiting on the 
server to be read on this connection. A number consistently greater than 0 
indicates congestion or a failing connection. The yellow bar also reflects this 
number. A full bar represents 50 or more packets in the queue. 


The second number (when shown) is a count of the number of packets that 
have been discarded because the write queue on the server grew too large. 
A red bar indicates the number of packets that were discarded in the last 

monitoring interval. A full bar represents 50 or more packets discarded. 
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Heading Description 


CPUQs These numbers monitor the packets that have been written using this 
connection. The first number is the number of packets per second. The 
second number is the number of bytes per second. Note that for wide 
area network connections, this does not include any overhead that TCP/IP 
introduces when transmitting the data. The blue bar represents the number 
of packets written in the last monitoring interval. A full bar represents 50 
or more packets per second. 


EVENTS The first (or only) number is the number of packets currently waiting to 
be written to the server on this connection. A number consistently greater 
than 0 indicates congestion. For a WAN connection, this might indicate a 
slow or failing connection. The yellow bar reflects this number. A full bar 
represents 50 or more packets in the queue. 


The second number (when shown) is a count of the number of packets that 
have been discarded because the write queue grew too large. The red bar 
indicates the number of packets that were discarded in the last monitoring 
interval. A full bar represents 50 or more packets discarded. 


PROC CT The status shows the state of the server connection. If the status is ERROR 
or FAILED, the error text is in the HW Model field. 


OS The version and build number show what version of the Availability 
VERSION Manager that the Data Server is running. 


If the number of packets waiting or discarded is consistently large, you might 
notice that the data displayed in the application updates at a slower rate. In 
extreme cases, nodes might turn black, indicating a lost connection with the node 
when, in reality, the problem is the congestion between the Data Analyzer and 
the Data Server. 


If you have a problem with congestion, consider scaling back the number of nodes 
or the amount of data being collected, or lengthening the collection intervals. 


Note 


Most of these fields have a tooltip describing the field contents and some 
additional data. The tooltips can be rather large. To ensure that the 
tooltip stays up as long as you need to read it, move the mouse slightly 
over the field to keep the tooltip visible. 


The rest of the Group/Node pane displays information about the OpenVMS groups 
and nodes that the Data Analyzer has found. By default, within each group, the 
Data Analyzer displays the nodes with which it can establish a connection. (If 
the Data Analyzer finds Windows nodes, those are also displayed.) 


2.8.1.1 Setting Up Groups 
Groups are set up during installation on Data Collector nodes and are user- 
definable. Be sure to define groups by cluster membership. If a node is not a 
member of a cluster, then you can define a group by function, type of hardware, 
or geographical location. 


If you want to change the groups being monitored, you need to use a 
customization option to make changes. See Section 7.4.1 for instructions. 
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Note 


HP recommends that you define a cluster as its own group. This is 
necessary for the Lock Contention, Disk Summary, Disk Volume, and 
Cluster data collections to function correctly. 


2.8.1.2 Displaying Group Information 


Groups—and the nodes in each group with which the Data Analyzer is able to 
establish a connection—are displayed in the Group/Node pane of the System 
Overview window (see Figure 2-25). 


To display only groups in the Group/Node pane, click the handle in front of a 
group name to a horizontal position, and the nodes in that group are removed, 
as shown for both groups in Figure 2-26. (Clicking the handle into a vertical 
position displays nodes again.) 


Figure 2-26 Group Overview Pane 
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Group [DECAMDS] has 4 nodes 6157K/650988K 


The numbers in parentheses after “OpenVMS” (in the Group/Node pane of the 
System Overview window) are the following: 


— The first number in parentheses is the total number of groups that are listed. 


— The second number in parentheses is the total number of nodes in all the 
listed groups with which the Data Analyzer can establish a connection. 


On each group name row, following the name of the group, the number in 
parentheses is the number of nodes in that group with which the Data Analyzer 
has established a connection. 


On a group name row under the OS Version heading are color-coded numbers 
indicating the number of nodes in that group that are one of five color-coded 
states. These states are explained in Table 2-2. 
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Additional summary information about the entire group is on the group line. 
CPU, MEM, BIO, and DIO numbers are averages. The rest of the number are 


totals for all of the nodes in the group. 


Notice the small triangle in the BIO heading in Figure 2—26. The direction of 
the triangle indicates that the nodes are sorted in descending order of BIO rates. 
Click on the triangle to reserve the sort order, or click on another column header 


to select a new item on which to sort data. 


In the Group/Node pane, only nodes within a group are sorted. The groups 


remain in alphabetical order. 


2.8.2 Displaying Node Information 


The Group/Node pane of the System Overview window allows you to focus on 
resource usage activity at a high level and to display more specific data whenever 
you want. This section explains the basic use of the Group/Node pane. For more 


information, see Chapter 3. 


You can sort groups in the Group Overview window 
by changing the sort order of one of the data column headings (see Figure 2-26). 


2.8.2.1 Displaying Summary Node Information 
Even when nodes are not displayed on the System Overview window or the 


Group/Node pane, you can display important node information by placing the 


cursor over a group name or 


group name, for example, the tooltip similar to the one shown in Figure 2—27 


icon. By holding the cursor over the KOINE 


is displayed, containing summary node information. 


Figure 2-27 Tooltip Example: Summary Node Information 
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Group [DECAMDS] has 4 nodes 
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Note 


Most of these fields have a tooltip describing the field contents and some 
additional data. The tooltips can be rather large. To ensure that the 
tooltip stays up as long as you need to read it, move the mouse slightly 
over the field to keep the tooltip visible. 


Possible tooltip colors and their meanings are in Table 2-3. 


Table 2-3 Explanation of Tooltip Colors 


Color 


Meaning 


Brown 
Yellow 


Black 


Red 


Green 


Indicates why the configuration of the node failed. 


Shows number of Data Collector multicast “Hello” messages received and 
the number of attempts to configure the node (“Configuration packets sent”). 
Nodes that remain in this state more than several seconds indicate network 
connectivity problems with the Data Analyzer. 


Shows the following: 


For nodes that were in the data collection state (see Table 2-2), and 
communication was then lost: 

— When the connection to the node was lost (“Path lost at time”). 

— When that node was booted (“Boot time: time”). 

— What the uptime of the node was (“Uptime: time”). 


For nodes that were in the connection failed state (see Table 2—2): 
— When the connection to the node was lost (“Path lost at time”). 
— The reason the node was not configured. 


Nodes have exceeded a threshold, causing events to be posted for the node. 
If an event causes the output of any message besides an informational one, 
a node is displayed in red. 


The security check was successful, and the nodes are collecting data; node 
uptime is shown. 


The Group/Node pane is designed to display monitored nodes in a single pane. 
This format works well for sites that have relatively few nodes to monitor. 
However, for large sites that have many groups and nodes, scrolling through the 
display can be time-consuming. To help those with large sites, two additional 
windows are available: 


e The Group Overview window 


e The Single-Group window 


2.8.2.2 Displaying a Group Overview Window 
The first window to help you view large sites is the Group Overview window. To 
view all the group name row data easily, click on the View menu at the top of the 
page and select “Group Overview.” The Group Overview window that is displayed 
(Figure 2—28) is similar to the Group Overview pane in Figure 2—26. 
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Figure 2-28 Group Overview Window 


Group Overview 
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This display is designed to provide an overview of all the groups being monitored. 
If you want more information about a group, place the cursor over the group 
name or icon. A tooltip is displayed with additional information about nodes in 
the group similar to the one displayed in Figure 2—27. 


You can also double-click a group name to display a Single-Group window, as 
explained in Section 2.8.2.3. 


2.8.2.3 Displaying a Single-Group Window 
The second window to help you view large sites is the Single-Group window. This 
display shows the nodes in one group (see Figure 2-29). 


To obtain this display, you can also right-click the group name in the Group/Node 
pane and select the “Display” option. A separate window appears with only the 
nodes in the group you have selected (see Figure 2-29). This window is useful 
in simultaneously displaying groups that are not adjacent in the list in the 
Group/Node pane. 


Figure 2-29 OpenVMS Single-Group Window 
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Within each group of nodes displayed, the Data Analyzer displays all the nodes 
with which it can communicate. If some nodes in the group are not displayed, it 
is because the Data Analyzer has not received a multicast “Hello” message from 
the Data Collector on that node. 


The display includes the following items: 


e A list of the nodes in the group along with summary data for each node. In 
Figure 2-25, the Debug cluster group contains 9 nodes. 


e A color-coded monitor icon preceding each node name indicates the state of 
the node. See Table 2—2 for explanations of states these colors indicate. 


e For various node data items, some graphs indicate the percentage of an item 
that is being used; other graphs are totals. 
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Green graphs indicate percentages below a customized threshold; red graphs 
indicate percentages above a customized threshold. Some data items are 
numbers, not percentages; for example, CPUs, CPU queues, and events. 


More information about node data is in Chapter 3. 


Somewhat different information is displayed for a group of Windows nodes. For 
more information, see Section 3.1.2. 


2.8.2.4 Focusing On a Specific Node 
To display more information about an individual node, double-click a node name 
or in the Single-Group window or the Group/Node pane. You can also right-click 
a node name and select the “Display...” option. The Data Analyzer displays the 
Node Summary page shown in Figure 2-30. (The data on this page is explained 
in more detail in Chapter 3.) 


Figure 2-30 OpenVMS Node Summary 
101 x! 
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OpenVMs Alpha node PRFE4S5 - Node Summary 


At the top of the Node Summary page are tabs that correspond to types of node 
data displayed in the Group/Node pane. If you double-click a field under a 
column heading in the Group/Node pane, the Data Analyzer displays a page that 
provides more information about that field. For example, if you click a value 
under “CPU”, the Data Analyzer displays a page similar to the one shown in 
Figure 3-6. 
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2.8.2.5 Specifying Data to Be Collected 


By default, the only data collected for a node is the data displayed in the Node 
pane (Figure 2-29). This data is called a node summary data collection. The 
events in the Event pane of the System Overview window (see Figure 2—25) are 
produced when node summary data is processed. See Appendix C for a list of 
events associated with node summary data. 


If you want to signal additional events that are listed in Appendix C, you must 
collect the data associated with those events. To collect this data by default, you 
must enable background data collection for the data. Background and foreground 
data collections are explained in more detail in Section 1.4.1.2. 


For OpenVMS nodes, if you want background data collection (and the associated 
event detection), you must turn on data collection for each type of data you want 
to collect. On Windows nodes, background data collection is always enabled and 
cannot be turned off. 


To turn on various types of data to be collected, follow these steps: 

1. In the System Overview window (Figure 2—25), click the Customize menu. 
2. Click Customize OpenVMG.... 

3. Click the Data Collection tab. 


The Data Analyzer then displays the Data Collection Customization page 
(Figure 2-81). 


Figure 2-31 Data Collection Customization 


Customization - Open¥MS Default Settings x! 


ee Calection Fitter | Security | 


i {iiif] Cluster summary 10.0 200 60.0 
(i fil CPU made San Senin ae 
J iil CPU process 50 © 10.0 30.0 
@ IS OND 315,000) 60.0 
ota) UNG BOSEG 60.0 
fain 100 © 10.0 30.0 
(2 BilLockcontention 100200 0.0 
i llMemoy 60. | 100) | 300 
©) fill] Node summary 5.0 | 5.0 5.0 
[i filPageSwapfle 300 300 600 
© fii] Single disk 5.0 WA NiA 
© fill] Single process 50. | NIA NIA 
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The following types of data are collected by default: 
e Node summary 


e Single disk 
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e Single process 


To turn on a type of data collection, select the checkbox for that type of data 
collection in the “Collect” column. For example, to collect CPU process data, check 
the checkbox for “CPU process” in the Collect column. Clicking the checkbox 
again clears it. 


When you click a data collection name, the Explanation section at the bottom 
of the page tells where the data for a particular data collection is displayed. 
Table 7-3 summarizes this information. 


You cannot turn off the collection of single disk and single process data. These 
types of data are collected by default when you open a Single Disk Summary page 
or a Process Information page, respectively. 


On the Data Collection Customization page, you can change the intervals at 
which data is collected. Collection intervals are explained in Chapter 7. 


2.8.2.6 Sorting Data 
You can sort data in many OpenVMS displays. The following list provides some 
examples. To sort the values in a field, click the corresponding column heading. 
To reverse the sort order, click the column heading again. 


e Event pane of the System Overview window (Figure 2-25) 
e CPU Process Summary pane (Figure 3-8) 

e Memory page (Figure 3-10) 

e Bottom pane of I/O Summary page (Figure 3-12) 

e Disk Status Summary page (Figure 3-14) 

e Disk Volume Summary page (Figure 3-16) 


Depending on the field, you can sort data alphabetically or numerically. An 
alphabetical sort is performed using ASCII character values; for example, dollar 
signs ($) precede letters in the sort order. 


2.8.3 Using the Event Pane 


The event pane occupies the bottom part of the System Overview window 
(Figure 2—25). In this pane, the Data Analyzer displays events that occur on all 
the nodes being monitored on your system, including nodes that might not be 
displayed currently in the Group/Node pane. 


Events signal potential problems that might require further investigation. An 
event must reach a certain level of severity to be displayed. You can customize 
the severity levels at which events are displayed (see Chapter 7). For more 
information about displaying events, see Chapter 5. 


The events that are signalled depend on the types of data collection that are 
performed (see Section 2.8.2.5). 


In the System Overview window, you can change the size of the panes as well as 
the width of specific fields. You can also change the borders between the fields by 
placing the mouse on the border, displaying a double-headed arrow, and dragging 
the border to the right or left. 


Scroll bars indicate whether you are displaying all or part of a pane. For example, 
clicking a right arrow on a scroll bar allows you to view the rightmost portion of 
a screen. 
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2.8.4 Other System Overview Window Components 
In addition to panes, the System Overview window (Figure 2—25) also includes 
features such as a title bar, menu bar, and status bar: 
Title bar 
The title bar runs across the top of the window and contains the product name 
and version. 
Menu bar 
The menu bar, immediately below the title bar, contains the following menu 
options: 
e File 
The File menu contains the Exit option, which allows you to stop the Data 
Analyzer and close the window. 
e Customize 
The Customize menu contains options that allow you to customize various 
aspects of the Data Analyzer. These options are explained in Chapter 7. 
e Help 
The Help menu offers different types of online help for the Data Analyzer. 
These options are explained in Section 2.9. 
Status bar 
The status bar, which runs across the bottom of the window, displays the 
following: 
e The name of the selected group and the number of nodes in that group. 


e The Java Virtual Machine memory statistics - the current amount of memory 
used and the maximum amount of memory. If the current amount of memory 
stays close or is equal to the maximum amount of memory, various odd 
behaviors may occur including hanging data collections for nodes, unable to 
show dialog boxes, etc. due to the lack of memory. 


e The current time in a colored box. The color of the box goes from green (the 
Data Analyzer is keeping up with the amount incoming data) to red (the Data 
Analyzer is having trouble processing the amount of incoming data), and 
various shades in between. 

Displaying More Information at Any Time 

In the initial System Overview window (Figure 2—25), which is displayed by 

default, you can perform the following actions at any time during the display: 

e Click on a field to select it. 


e Double-click most fields to display a page containing information specific to 
that field. 


e Right-click a field to display a shortcut menu with additional choices on it. 
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2.9 Getting Help 


To obtain online help, click on the Help menu on the System Overview window 
menu bar. Then choose one of the following options, which are displayed at the 
top of the page. 


Menu Option Description 

Availability Manager Information about using the Availability Manager. 

User Manual 

Getting Started A special online version of help for getting started using this 
tool. 

Availability Manager Last-minute information about the software and how it works. 

Release Notes 

About Availability Information about this Availability Manager Data Analyzer 

Manager... release (such as the copyright date). 


2.10 Printing a Display 


The Data Analyzer does not provide a printscreen capability. However, you can 
capture Data Analyzer displays and print them by following these steps: 


1. 
2. 


Click on the selected Data Analyzer display to make it your active window. 
Press the key combination Alt + PrintScreen. 


This action copies the image of the display into your copy buffer. (To capture 
the entire screen, press Ctrl + PrintScreen.) 


Run the Windows Paint program: 
Start --> Programs --> Accessories --> Paint 
Do one of the following: 
e Press the key combination Ctrl + V. 
e From Paint’s Edit menu, select Paste. 
Then do one of the following: 
e Select an option from Paint’s File menu. For example: 


— Save or Save As...: to name the file containing the display image and 
place it in a directory that you specify. 


— Print: to print the display image on a printer that you select. 


e Use one of Paint’s editing options to edit the display image before saving 
or printing it. 
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Note 


Before you start this chapter, be sure to read the explanation of data 
collection, events, thresholds, and occurrences as well as background and 
foreground data collection in Chapter 1. HP also recommends completing 
the getting-started steps described in Chapter 2. 


Node summary data is the only data that is collected by default. The Data 
Analyzer looks for events only in data that is being collected. 


You can collect additional data in either of the following ways: 


Open any display page that contains node-specific data (for example, CPU, 
memory, I/O) automatically starts foreground data collection and event 
analysis except for Lock Contention and Cluster Summary information. 
(You must select these tabs individually to start foreground data collection.) 
Collection and evaluation continue as long as a page with node-specific data 
is displayed. 


Click a check mark on the Data Collection Customization page (which you 
can select on the Customize OpenVMS... menu) enables background collection 
of that type of data. Data is collected and events are analyzed continuously 
until you remove the check mark. 


For additional information about how to change these settings, see Chapter 7. 


This chapter describes the node data that the Data Analyzer displays by default 
and more detailed data that you can choose to display. Differences are noted 
whenever information displayed for OpenVMS nodes differs from that displayed 
for Windows nodes. 


Although Cluster Summary is one of the tabs displayed on the OpenVMS Node 
Summary page (Figure 3-4), see Chapter 4 for a detailed discussion of OpenVMS 
Cluster data. 


Note 


On many node displays, you can hold the cursor over a data field or 
column header to display an explanation of that field or header in a small 
rectangle, called a tooltip. Figure 3-2 contains an example. 


Some tooltips can be rather large. To ensure that the tooltip stays up as 
long as you need to read it, move the mouse slightly over the field to keep 
the tooltip visible. 
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3.1 Group/Node Pane 


The Data Analyzer automatically displays data for each node within the groups 
displayed in the Group/Node pane of the Application window (Figure 3-1). 


Figure 3-1 OpenVMS Group/Node Pane 
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Recall that the colors of the icons represent the following states: 


Color Description 

Brown Attempts to configure the node have failed—for example, because the 
nodes are in a connection failed state. 

Yellow Node security check is in progress. 

Black Network path to node has been lost, or the node is not running. 

Red Security check was successful. However, a threshold has been 


exceeded, and an event has been posted. 


Green Security check was successful; data is being collected. 


If you hold the cursor over a node name, the Data Analyzer displays a tooltip 
explaining the specific reason for the color that precedes the node name. By 
holding the cursor over many column headers and some data items on Data 
Analyzer screens, you can display tooltips. Figure 3-2 is an example of a tooltip 
that explains the BIO column header in the Group/Node pane. 


Figure 3-2 Sample Tooltip 
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The colors and their meanings are in Table 3-1. 


Table 3-1 Explanation of Tooltip Colors in the Group/Node Pane 


Color 


Meaning 


Brown 
Yellow 


Black 


Red 


Green 


Indicates why the configuration of the node failed. 


Shows number of RM Driver multicast “Hello” messages and the number of 
attempts to configure the node (“Configuration packets sent”). Nodes that 
remain in this state more than a few seconds indicate network connectivity 
problems with the Data Analyzer. 


Shows one of the following: 


If the node was successfully configured and then lost, 

— When the connection to the node was lost (“Path lost at time”). 
— When that node was booted (“Boot time: time”). 

— What the uptime of the node was (“Uptime: time”). 


If the node was never configured, 
— When the connection to the node was lost (“Path lost at time”). 
— The reason the node was not configured. 


If an event causes the output of any message besides an informational one, 
a node is displayed in red. 


Nodes are in the data collection state. 


The following sections describe the data displayed for OpenVMS and Windows 
Group/Node panes. 


3.1.1 OpenVMS Node Data 


Node data with a graph displayed in red indicates that the amount is above the 
threshold set for the field. For each OpenVMS node and group it recognizes, the 
Data Analyzer displays the data described in Table 3-2. This table also lists the 
abbreviation of the event that is related to each type of data, where applicable. 
See Section 7.8 for information about setting event thresholds. Appendix B 
describes OpenVMS and Windows events. 


Note that you can sort the order in which data is displayed in the Node Pane by 
clicking a column header. To reverse the sort order of a column of data, click the 
column header again. 


Table 3-2 OpenVMS Node Data 


Data Description of Data Related Event 
Node Name Name of the node being monitored. n/a 
CPU! Percentage of CPU usage of all processes on the HICOMQ 
node. HIMTTO 
PRCCUR 
PRCPUL 


Active CPUs 


The number of active CPUs over the number of n/a 
CPUs in the potential set. The potential set is 

the maximum number of CPUs available to the 

node. 


1By default, the CPU heading follows Node Name on a line of Node pane data. You can use the cursor 
to move a column heading to another location on the line, if you like. 


(continued on next page) 
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Table 3-2 (Cont.) OpenVMS Node Data 


Data 


Description of Data 


Related Event 


MEM 
PFLTS 
PFW/COM 


BIO 
DIO 
CPU Qs 


Events 


Proc Ct 


OS Version 
HW Model 


HW Arch 


Percentage of space in memory that all processes 
on the node use. 


Total page faults and hard page faults per second 
for all processes on the node. 


Number of processes in page fault wait (PFW) 
and compute (COM) states. 


Buffered I/O rate of processes on the node. 
Direct I/O usage of processes on the node. 


Number of processes in one of the following 
states: COMO, MWAIT, COLPG, FPG. 


Number of triggered events that are associated 
with this node. 


Actual count of processes over the maximum 
number of processes. Percentage of actual to 
maximum processes. 


Version of the operating system on the node. 


Hardware model of the node. 


Hardware architecture: Alpha or VAX 


(continued on next page) 


3-4 Getting Information About Nodes 


LOMEMY 


HITTLP 
HIHRDP 


HICOMQ 
HIPFWQ 


HIBIOR 
HIDIOR 


HICMOQ 
HIMWTQ 
HIPWTQ 


List of relevant 
events 


HIPRCT 


NOPLIB 
UNSUPP 


NOPLIB 
UNSUPP 


n/a 
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Table 3-2 (Cont.) OpenVMS Node Data 
Data Description of Data Related Event 


DC The Data Collector capability level and Managed MINCAP 
Object registration retrieval status. 


Each version of the Data Collector has a 
capability level associated with it. This value 
tells the Data Analyzer what capabilities the 
Data Collector has (e.g. ability to execute disk 
volume fixes). If the capability value is below 
what the Data Analyzer will support, a MINCAP 
event will be signaled, and puts the node in the 
connection failed state, and not collect data from 
the node. 


The Managed Object registration retrieval status 
indicates whether or not the Data Analyzer 
could get the data indicating what Managed 
Objects have registered with the Data Collector. 
Managed Objects are described more fully in 
Chapter 4. 


The values for the Managed Object registration 
status are as follows: 


Status Description 


D Done. Managed Objects are 
not supported by the Data 
Collector. The Data Analyzer 
will adjust collect data that the 
Data Collector supports. 


NS Not Sent. The Data Collector 
supports Managed Objects. The 
request for the registration data 
has not been sent. 


Ss Sent. The request for the 
registration data has been sent, 
and the Data Analyzer is waiting 
for the response. 


Vv Valid. The registration has been 
received and processed by the 
Data Analyzer. 


E Error. There was an error in 
getting the registration data from 
the Data Collector. 


3.1.2 Windows Node Pane 


Figure 3-3 is an example of a Windows Node pane. From the group you select, 
the Data Analyzer displays all the nodes with which it can communicate. 
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Figure 3-3 Windows Node Pane 
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For each Windows node in the group, the Data Analyzer displays the data 
described in Table 3-3. 


Table 3-3 Windows Node Data 


Data Description 

Node Name Name of the node being monitored. 

CPU Percentage of CPU usage of all the processes on the node. 

MEM Percentage of memory that is in use. 

DIO Direct I/O usage of processes on the node. 

Processes Number of processes on the node. 

Threads Number of threads on the node. A thread is a basic executable 
entity that can execute instructions in a processor. 

Events The number of events on the node. An event is used when two or 
more threads want to synchronize execution. 

Semaphores The number of semaphores on the node. Threads use semaphores 
to control access to data structures that they share with other 
threads. 

Mutexes The number of mutexes on the node. Threads use mutexes to 
ensure that only one thread executes a section of code at a time. 

Sections The number of sections on the node. A section is a portion of 
virtual memory created by a process for storing data. A process 
can share sections with other processes. 

OS Version Version of the operating system on the node. 

HW Model Hardware model of the node. 


3.2 Node Data Pages 


The following sections describe node data pages, which you can display in any of 


the following ways: 


e Double-click a data item in the Group/Node or Node pane to display an 


associated page. 


e Double-click a node name on the Group/Node or Node pane to display a 
Node Summary page (Figure 3-4). You can then click other tabs on the 
Node Summary page to display the same detailed data that you display by 
double-clicking a data item in the Group/Node or Node pane. 


e Double-click an event in the Event pane. 


The menu bar on each node data page contains the options described in 


Table 3-4. 
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Table 3-4 Node Data Page Menu Bar 


For More 
Menu Option Description Information 
File Contains the Close option, which you can choose n/a 
to exit from the pages. 
View Contains options that allow you to view data See specific pages. 
from another perspective. 
Fix Contains options that allow you to resolve various Chapter 6 
resource availability problems and improve 
system performance. 
Customize Contains options that allow you to organize data Chapter 7 


collection and analysis and to display data by 
filtering and customizing data collected from 
Data Collectors. 


The following sections describe individual node data pages. 


3.2.1 Node Summary 


When you double-click a node name, operating system (OS) version, or hardware 
model in an OpenVMS Group/Node pane (Figure 2—25) or a Windows Node pane 
(Figure 3-3), the Data Analyzer displays the Node Summary page (Figure 3-4). 


Figure 3-4 Node Summary 


lo) x! 


Model: AlphaServer ES45 Model 2 
OS Version: OpenVMS V8.2 

Uptime: 4 01:53:28 54 

Memory: 8.00 GB 

Active CPUs: 4 


Configured CPUs: 4 
CPU Architecture: Alpha 
Max RADs: {| 
Serial Number: 


| 00 00 00 00 32 33 30 41 5a 50 53 4a 32 31 32 34 
Galaxy ID: 


35 34 45 46 52 50 01 Of 11 Of Of 6a 7d 65 OF 00 


OpenvMs Alpha node PRFE4S5 - Node Summary 


On this page, the following information is displayed for the selected node: 
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Data Description 

Model System hardware model name. 

OS Version Name and version of the operating system. 

Uptime Time (in days, hours, minutes, and seconds) since the last reboot. 

Memory Total amount of physical memory (in MBs or GBs) found on the system. 

Active CPUs Number of CPUs running on the node. 

Configured Number of CPUs that are configured to run on the node. 

CPUs 

Max RADs Maximum number of resource affinity domains (RADs) for this node. 

Serial Number The system’s hardware serial number retrieved from the Hardware 
Restart Parameter Block (HWRPB). 

Galaxy ID The Galaxy ID uniquely identifies a Galaxy. Instances in the same 


Galaxy have the same Galaxy ID. 


3.2.2 CPU Modes and Process Summaries 


By clicking the CPU tab, you can display CPU panes that contain more detailed 
statistics about CPU mode usage and process summaries than the Node Summary 
does. You can use the CPU panes to diagnose issues that CPU-intensive users 

or CPU bottlenecks might cause. For OpenVMS nodes, you can also display 
information about specific CPU processes. 


When you double-click a value under the CPU or CPU Qs heading on either 
an OpenVMS Group/Node or a Windows Node pane, or when you click the 
CPU tab, the Data Analyzer displays the CPU Mode Summary in the top pane 
(Figure 3-6) and, by default, CPU Mode Details (Figure 3-7) in the lower pane. 
You can use the View menu to select the CPU Process Summary in the lower 
pane (Section 3.2.2.4). 


CPU mode summaries and process summary panes are described in the following 
sections. Note that there are differences between the pages displayed for 
OpenVMS and Windows nodes. 

3.2.2.1. Windows CPU Modes 


Figure 3-5 provides an example of a Windows CPU Modes page. The sample 
page contains values for the three CPU modes—user, privileged, and null. 
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Figure 3-5 Windows CPU Modes 


aisixi 


File ix Customize 


(Node Summary [CPU /Memory | Disk | 


CPU Modes 
Current Extreme 
User 30.20 31.98 
Privileged a0as 21.77 
Null 49.46 48.00 


100 


DPCs Queued/sec 18.63 44.92 
Interrupts/sec 146.82 173.68 


_|_DPCs Queued | DPC Rate | DPC Bypasses | APC Bypasses 
0.21 0.00 0.00 0.41 
18.43 0.00 0.00 0.21 


Windows NT Intel node AFFC36 - CPU 


The top pane of the Windows CPU Modes page is a summary of Windows CPU 
usage, listed by type of mode. 


On the left, the following CPU modes are listed: 
e User 

e Privileged 

e Null 


On the graph, values that exceed thresholds are displayed in red. To the right of 
the graph are current and extreme amounts for each mode. 


Current and extreme amounts are also displayed for the following values: 
e Deferred procedure calls (DPCs) queued per second 
e Interrupts that occurred per second 


The lower pane of the Windows CPU Modes contains modes details. The following 
data is displayed: 


Data Description 

CPU ID Decimal value representing the identity of a processor in a 
multiprocessing system. On a uniprocessor, this value is always 
CPU #00. 

Mode % Graphical representation of the percentage of active modes on that 
CPU. The color displayed matches the mode color on the graph on the 
top pane. 

DPCs Queued Rate that deferred procedure call (DPC) objects are queued to this 
processor’s DPC queue. 

DPC Rate Average rate that DPC objects are queued to this processor’s DPC 


queue per clock tick. 


DPC Bypasses Rate that dispatch interrupts were short-circuited. 
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Data Description 


APC Bypasses Rate that kernel asynchronous procedure call (APC) interrupts were 
short-circuited. 


3.2.2.2 OpenVMS CPU Mode Summary and Process States 


Figure 3-6 shows sample OpenVMS CPU Mode Summary and CPU Process 
States, which are the left and right top panes of the CPU Modes page. 


Figure 3-6 OpenVMS CPU Mode Summary and Process States 
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CPU Mode Summary 


In the CPU Mode Summary section of the pane, percentages are averaged across 
all the CPUs and are displayed as a single value on symmetric multiprocessing 
(SMP) nodes. 


To the left of the graph is a list of CPU modes. The bars in the graph represent 
the percentage of CPU cycles used for each mode. To the right of the graph are 
current and extreme percentages of time spent in each mode. 


Below the graph, the Data Analyzer displays the COM and WAIT process queues: 


e COM: The value displayed is the number of processes in the COM and COMO 
states. 


e WAIT: The value displayed is the number of processes in the miscellaneous 
WAIT, MWAIT, COLPG, CEF, PFW, and FPG states. 


CPU Process States 

The right side of Figure 3-6 shows a sample CPU Process States display. Note 
that the value for MWAIT, in the left column, is the sum of all values for the 
states in the two right columns. 


This display shows the number of processes in each process state. This number 
is tallied from the data in CPU Process view of the CPU page (Figure 3-6). For 
systems with many processes, the data in the CPU Process view is collected 

in segments over a short period of time because the amount of data a network 
packet can contain is limited. Because of this, the number of processes in the 
Process States pane might differ slightly from what is reported in $MONITOR 
STATES. 
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Appendix A contains explanations of the CPU process states. 


3.2.2.3 OpenVMS CPU Mode Details 
The lower pane of the CPU Modes page contains CPU mode details, as shown in 


Figure 3-7. 


Figure 3-7 OpenVMS CPU Mode Details Pane 
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Process Name Capabilities 
|*** None *** PRIMARY RUN QUORUM 
*** None *** RUN QUORUM 
|*** None *** | RUN QUORUM 
*F¥ None + RUN QUORUM 
*** None *** RUN QUORUM 

a [*** None *** | RUN QUORUM | 
*** None *** RUN QUORUM 
*** None *** RUN QUORUM 
|*#** None *** | RUN QUORUM 
*** None *** RUN QUORUM 
*** None *** RUN QUORUM 
|+** None *** RUN QUORUM 
*** None *** RUN QUORUM 
E +** None *** RUN QUORUM 
|+** None *** RUN QUORUM 
*** None *** RUN QUORUM 

i *** Mone *** | RUN QUORUM 

[*** Hone *** RUN QUORUM 
cTm$ O00F010c RUN QUORUM 

jerms oo0r0099 | RUN QUORUM = 
cT™$ OOOFO04A RUN QUORUM 
|*** None *** RUN QUORUM 
CTM§ OOOFOOBF | RUN QUORUM 
jets O00F00Z5 RUN QUORUM 
[ems 00040019 | RUN QUORUM 
cT™m$ 00080015 RUN QUORUM 
cTm$ OO0COO0B RUN QUORUM 
[ems 00090008 | RUN QUORUM 
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ipenVMS Alpha node QTV18 - CPU Summary - physical modes view: 30 physical cpus (30 listed, 0 filtered out) 


In the OpenVMS CPU Mode Details pane, the following data is displayed: 


Data 


Description 


CPU ID 


State 


Mode % 


PID 


Process Name 


Decimal value representing the identity of a processor in a 
multiprocessing system. On a uniprocessor, this value is always 
CPU #00. 


One of the following CPU states: Boot, Booted, Init, Rejected, 
Reserved, Run, Stopped, Stopping, or Timeout. 


Graphical representation of the percentage of active modes on that 
CPU. The color displayed coincides with the mode color in the 
graph in the top pane. 


Process identifier (PID) value of the process that is using the 
CPU. If the PID is unknown to the Data Analyzer application, the 
internal PID (IPID) is listed. 


Name of the process active on the CPU. If no active process is 
found on the CPU, the name is listed as *** None ***, 
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Data Description 


Capabilities One or more of the following CPU capabilities or flags: 
e Capabilities: Primary, Quorum, Run, or Vector. 


e Flags: Idle, Lckmgr, Fastpath_CPU, Fastpath_Ports, Low_ 
power, and Cothread_of_nn. 


RAD Number of the RAD where the CPU exists. 


The status bar in the OpenVMS CPU Mode Details pane (see Figure 3-7) shows 
the potential number of physical CPUs on the node, the number that are listed, 
and the number that are filtered out. The status bar is updated with each data 
collection. The data collection rate is determined by the customization of CPU 
mode data collection intervals. See Section 7.5 for instructions on how to change 
data collection intervals. 


3.2.2.4 OpenVMS CPU Process Summary 


To display the OpenVMS CPU Process Summary pane at the bottom of the CPU 
page, select CPU Process Summary from the View menu (Figure 3-6). Figure 3-8 
shows a sample OpenVMS CPU Process Summary pane. 


Figure 3-8 OpenVMS CPU Process Summary Pane 


PID | ProcessName _|__ Priority | -__-State | Rate | ‘Wait | Time ____| Home RAD 
216005FF FRED1 10 1 6/ 4 IB); 40.06 0.00; 0 00:02:59.83 in 
21600600 FRED1 11 1 6/ 4 INNER_MODE 21.89 0.00; 0 00:03:34.13 oO 
21600601 FRED1 12 1 4/ 4 com 9.65; 90.00) O 00:01:59.95 oO 
21600602 FRED1 13 1 | 4/4 com 0.00 99.99/ 0 00:02:02.21 ia 
21600603 FRED1 14 1 4/ 4 INNER MODE, 23.18 0.09) 0 00:02:17.69 a 
21600604 FRED1 15 1 6/ 4 HIB 4.38 0.00; 0 00:01:55.56 in 
21600605 4/ 4 com 0.00; 99.99) 0 00:02:13.87 in 


The OpenVMS CPU Process Summary pane displays the following data: 


Data Description 

PID Process identifier, a 32-bit value that uniquely identifies a process. 

Process Name Name of the process active on the CPU. 

Priority Computable (xx) and base (yy) process priority in the format xx/yy. 

State One of the process states listed in Appendix A. 

Rate Percentage of CPU time used by this process. This is the ratio of 
CPU time to elapsed time. The CPU rate is also displayed in the 
bar graph. 

Wait Percentage of time the process is in the COM or COMO state. 

Time Amount of actual CPU time charged to the process. 

Home RAD Where most of the resources of the process reside. 


Displaying Single Process Information 


When you double-click a PID on the lower part of an OpenVMS CPU Process 
Summary (Figure 3-8), Memory Summary (Figure 3-10), or I/O Summary 
(Figure 3-12) page, the Data Analyzer displays the first of several OpenVMS 
Single Process pages. 


3-12 Getting Information About Nodes 


Getting Information About Nodes 
3.2 Node Data Pages 


On these pages, you can click tabs to display specific data about one process. 
Alternatively, you can display all of the information on the pages on a single 
vertical or horizontal grid page. 


This data includes a combination of data elements from the CPU Process, 
Memory, and I/O pages, as well as data for specific quota utilization, current 
image, and queue wait time. These pages are described in more detail in 
Section 3.3. 


The status bar in the OpenVMS CPU Process Summary Pane (Figure 3-8) shows 
the total number of processes on the node, the number that are listed, and the 
number that are filtered out. The status bar is updated with each data collection. 
The data collection rate is determined by the customization of CPU process 

data collection intervals. See Section 7.5 for instructions on how to change data 
collection intervals. 


3.2.3. Memory Summaries and Details 


The Memory Summary and Memory Details pages contain statistics about 
memory usage on the node you select. The Memory Summary pages displayed 
for OpenVMS and Windows nodes are somewhat different, as described in the 
following sections. The Memory Details page exists only for OpenVMS systems. 


3.2.3.1 Windows Memory Summary 


To display the Windows Memory Summary page, you can use either of the 
following methods: 


e Double-click a node, and then click the Memory tab (Figure 3-3). 
e Double-click a value under the MEM heading (Figure 3-38). 
The Data Analyzer displays the Windows Memory page (Figure 3-9). 


Figure 3-9 Windows Memory 


2101xJ 
File Customize 
(Node Summary [CPU | Memory [Disk | 
Memory (127 42 Megabytes) 
Current Extreme 
Available 93.91 MB 93.85 MB 
Cache 10.66 MB 10.66 MB 
Paged Pool 7.53 MB 7.52 MB 
Nonpaged Pool 2.41 MB 2.40 MB 


Committed Bytes 26.16 MB 26.17 MB 


Commit Limit 243.13 MB 


Windows NT Intel node AFFC36 - Memory 


The Current and Extreme amounts on the page display the data shown in the 
following table. The table also indicates what the graph amounts represent. 
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Data 


Description 


Available 


Cache 


Paged Pool 


Nonpaged Pool 


Committed 
Bytes 


Commit Limit 


Size (in bytes) of the virtual memory currently on the zeroed, free, and 
standby lists. Zeroed and free memory are ready for use, with zeroed 
memory cleared to zeros. Standby memory is removed from a process’s 
working set but is still available. The graph shows the percentage of 
physical memory that is available for use. 


Number of bytes currently in use by the system cache. The system 
cache is used to buffer data retrieved from disk or LAN. The system 
cache uses memory not in use by active processes on the computer. The 
graph shows the percentage of physical memory devoted to the cache. 


Number of bytes in paged pool, a system memory area where operating 
system components acquire space as they complete their tasks. Paged 
pool pages can be paged out to the paging file when the system does not 
access them for long periods of time. The graph shows the percentage 
of physical memory devoted to paged pool. 


Number of bytes in nonpaged pool, a system memory area where 
operating system components acquire space as they complete their 
tasks. Nonpaged pool pages cannot be paged out to the paging file; 
instead, they remain in memory as long as they are allocated. The 
graph shows the percentage of physical memory devoted to nonpaged 
pool. 


Amount of available virtual memory (the Commit Limit) that is in use. 
Note that the commit limit can change if the paging file is extended. 
The graph shows the percentage of the Commit Limit used by the 
Committed Bytes. 


Size (in bytes) of virtual memory that can be committed without having 
to extend the paging files. If the paging files can be extended, this limit 
can be raised. 


3.2.3.2 OpenVMS Memory Summary 


When you double-click a value under the MEM heading in an OpenVMS Node 
pane, or if you click the Memory tab, the Data Analyzer displays the OpenVMS 
Memory Summary page (Figure 3-10). 


Alternatively, if you click the View menu on the OpenVMS Memory Summary 
page, the following options are displayed in a shortcut menu: 


e Memory Summary View 


e Memory Details View 


You can click Memory Summary View to select the Memory Summary page, 
shown in Figure 3-10. 
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Figure 3-10 OpenVMS Memory Summary 
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202000B¢C DCE$RPCD 6672 10912 600000 0.00 0.00 
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OpenVMS Alpha node AFFS52 - Memory Usage - process view: 59 processes (58 listed, 1 filtered out) 


The graph in the top pane of Figure 3-10 shows memory distribution (Free, Used, 
and Modified) as absolute values, in megabytes of memory. Current and extreme 
values are also listed for each type of memory distribution. (Free memory uses 
the lowest seen value as its extreme.) Bad Pages show the number of pages that 
the operating system has marked as bad. 


The thresholds that you see in the graph are the ones set for the LOMEMY event. 
(The LOMEMY thresholds are also in the display of values for the MEM field in 
the OpenVMS Group/Node pane shown in Figure 2-25.) 


The lower pane in Figure 3-10 displays the data shown in the following table, 
including an abbreviation of the event that is related to each type of data, where 
applicable. 


Data Description Related Events 
PID Process identifier. A 32-bit value that uniquely n/a 

identifies a process. 
Process Name Name of the process. NOPROC, 

PRCFND 

Count Number of physical pages or pagelets of memory LOWEXT 

that the process is using for the working set 

count. 
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Data Description Related Events 


Size Number of pages or pagelets of memory the LOWSQU 
process is allowed to use for the working set size 
(also known as the working set list size). The 
operating system periodically adjusts this value 
based on an analysis of page faults relative to 
CPU time used. 


Extent Number of pages or pagelets of memory in the LOWEXT 
process’s working set extent (WSEXTENT) quota 
as defined in the user authorization file (UAF). 
Number of pages or pagelets cannot exceed the 
value of the system parameter WSMAX. 


Rate Number of page faults per second for the process. LOWSQU, 
LOWEXT, 
PRPGFL 

VO Rate of I/O read attempts necessary to satisfy PRPIOR 


page faults (also known as page read J/O or the 
hard fault rate). 


When you double-click a PID on the lower part of the Memory Summary 

page (Figure 3-10), the Data Analyzer displays an OpenVMS Single Process 
(Figure 3-23), where you can click tabs to display pages containing specific 
data about one process. This data includes a combination of data from the CPU 
Process, Memory, and I/O pages, as well as data for specific quota utilization, 
current image, and queue wait time. These pages are described in Section 3.3. 


The status bar in the Memory Summary page (Figure 3-10) shows the total 
number of processes on the node, the number that are listed, and the number 
that are filtered out. The status bar is updated with each data collection. The 
data collection rate is determined by the customization of memory data collection 
intervals. See Section 7.5 for instructions on how to change data collection 
intervals. 


3.2.3.3 OpenVMS Memory Details 


When you click the View menu on the OpenVMS Memory Summary page 
(Figure 3-10), the following options are displayed in a shortcut menu. To display 
memory details, select that option. 


e Memory Summary View 
e Memory Details View (Alpha only) 
The Data Analyzer displays the OpenVMS Memory Details page (Figure 3-11). 
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Figure 3-11 OpenVMS Memory Details 
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OpenVMS Alpha node 2BOYS - Memory Usage and RAD breakdown. 


The following data items are in a box at the top left of the page: 


Heading 


Description 


Successful Expansions 
Failed Expansions 


System space replication 


Number of successful nonpaged pool expansions. 
Number of failed attempts to expand nonpaged pool. 


Whether system space replication is enabled or disabled. 


To the right of the box is a list of system memory data that is displayed in the 
bar graphs at the bottom of the page. You can toggle these data items on or off 
(that is, to display them as bar graphs). You can also click a small box to choose 
between Linear and Logarithmic bar graph displays. 


The system memory data items are described in Table 3-5. 
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Table 3-5 System Memory Data 


Data Description 

Total memory Total physical memory size, as seen by OpenVMS. 

Available process memory Amount of total physical memory available to processes. 
This is the total memory minus memory allocated to 
OpenVMS. 

Free list Size of the free page list. 

Modified list Size of the modified page list. 

Resident code region Size of the resident image code region. 

Reserved page count Number of reserved memory pages. 

Galactic shared used Galaxy shared memory pages currently in use. 

Galactic shared unused Galaxy shared memory pages currently not in use. 

Global read-only Read-only pages, which are installed as resident when 


system space replication is enabled, that will also be 
replicated for improved performance. 


Total nonpaged pool Total size of system nonpaged pool. 
Total free nonpaged pool Amount of nonpaged pool that is currently free. 


To the right of the system memory data is a list of single RAD data items, which 
are described in Section 3.3.7. You can toggle these items to display them in bar 


graphs. 


Table 3-6 Single RAD Data Items 


Data Description 

Free list Size of the free page list. 

Modified list Size of the modified page list. 

Nonpaged pool Total size of system nonpaged pool. 

Free nonpaged pool Amount of nonpaged pool that is currently free. 


Below the list of single RAD items is a box where you can toggle between 
Percentage and Raw Data to display Current and Extreme values to the right 
of the bar graphs. 


3.2.4 OpenVMS I/O Summary and Page/Swap Files 


By clicking the I/O tab on any OpenVMS node data page, you can display a page 
that contains summaries of accumulated I/O rates. In the top pane, the summary 
covers all processes; in the lower pane, the summary is for one process. 


From the View menu, you can also choose to display (in the lower pane) a list of 
page and swap files. 


3.2.4.1 OpenVMS I/O Summary 
The OpenVMS I/O Summary page displays the rate, per second, at which I/O 
transfers take place, including paging write I/O (WIO), direct I/O (DIO), and 
buffered I/O (BIO). In the top pane, the summary is for all CPUs; in the lower 
pane, the summary is for one process. 
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When you double-click a data item under the DIO or BIO heading on the Node 
pane, or if you click the I/O tab, by default, the Data Analyzer displays the 
OpenVMS I/O Summary (Figure 3-12). 


Figure 3-12 OpenVMS I/O Summary 
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PID Process Name | DIO Rate| BIO Rate| PIO Rate| Open Files | DIO Avail) BIO Avail |BYTLM Avail] Files Avail 


2040042D | DNSSADVER 0.00 0.17 0.00 3 100 126 13936 97 


The graph in the top pane represents the percentage of thresholds for the types 
of I/O shown in Table 3-7. The table also shows the event that is related to each 
data item. For information about setting event thresholds, see Section 7.8. 


Table 3-7 I/O Data Displayed 


Related 
Type of I/O V/O Description Event 
Paging Write I/O Rate of write I/Os to one or more paging files. HIPWIO 
Rate 
Direct I/O Rate Transfers are from the pages or pagelets HIDIOR 
containing the process buffer that the system 
locks in physical memory to the system devices. 
Buffered I/O Rate Transfers are for the process buffer from an HIBIOR 
intermediate buffer from the system buffer pool. 
Total Page Faults Total of hard and soft page faults on the system, HITTLP 
as well as peak values seen during a Data 
Analyzer session. 
Hard Page Faults Total of hard page faults on the system. HIHRDP 
System Page Faults Page faults generated by OpenVMS itself. HISYSP 


Window Turn Rate Number of times that the file extent cache had to WINTRN 
be refreshed. 


Current and peak values are listed for each type of I/O. Values that exceed 
thresholds set by the events indicated in the table are displayed in red on the 
screen. Appendix B describes OpenVMS and Windows events. 
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To the right of the graph, the following values are listed: 


Value 


Description 


Threshold 
Current 
Peak 


Defined in Event Configuration Properties. 
Current value or rate. 


Highest value or rate seen since start of data collection. 


The lower pane displays summary accumulated I/O rates on a per-process basis. 
The following data is displayed: 


Data 


Description 


PID 
Process Name 
DIO Rate 


BIO Rate 
PIO Rate 


Open Files 
DIO Avail 


BIO Avail 


BYTLM 


Files 


Process identifier. A 32-bit value that uniquely identifies a process. 
Name of the current process. 


Direct I/O rate. The rate at which I/O transfers occur between the 
system devices and the pages or pagelets that contain the process 
buffer that the system locks in physical memory. 


Buffered I/O rate. The rate at which I/O transfers occur between the 
process buffer and an intermediate buffer from the system buffer pool. 


Paging I/O rate. The rate of read attempts necessary to satisfy page 
faults (also known as page read I/O or the hard fault rate). 


Number of open files. 


Direct I/O limit remaining. The number of remaining direct I/O limit 
operations available before the process reaches its quota. DIOLM quota 
is the maximum number of direct I/O operations a process can have 
outstanding at one time. 


Buffered I/O limit remaining. The number of remaining buffered I/O 
operations available before the process reaches its quota. BIOLM quota 
is the maximum number of buffered I/O operations a process can have 
outstanding at one time. 


The number of buffered I/O bytes available before the process reaches 
its quota. BYTLM is the maximum number of bytes of nonpaged 
system dynamic memory that a process can claim at one time. 


Open file limit remaining. The number of additional files the process 
can open before reaching its quota. The FILLM quota is the maximum 
number of files that can be opened simultaneously by the process, 
including active network logical links. 


When you double-click a PID on the lower part of the I/O Summary page, the 
Data Analyzer displays an OpenVMS Single Process, where you can click tabs to 
display specific data about one process. See Section 3.3 for more details. 


The status bar in the OpenVMS I/O Summary page (Figure 3-12) shows the total 
number of processes on the node, the number that are listed, and the number that 
are filtered out. The status bar is updated with each data collection. The data 
collection rate is determined by the customization of I/O data collection intervals. 
See Section 7.5 for instructions on how to change data collection intervals. 
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3.2.4.2 OpenVMS I/O Page/Swap Files 
Click I/O Page/Swap Files on the I/O page View menu to select this option. The 
Data Analyzer displays an OpenVMS I/O Page/Swap Files page. The top pane 
displays the same information as that in the OpenVMS I/O Summary page 
Figure 3-12. The lower pane contains the I/O Page/Swap Files pane shown in 
Figure 3-13. 


Figure 3-13 OpenVMS I/O Page/Swap Files 


Host Node File Name Used % Used Total Reservable 
MAWK | DISK$MAWK_PAGE: [SYSO.SYSEXE]PAGEFI... 84974) 42.49 199992, - 41148 


OpenVMS VAX node MAWK - |O Summary - memory file view: 2 memory files (1 listed, 1 filtered out) 


The I/O Page/Swap Files pane displays the following data: 


Data Description 
Host Name Name of the node on which the page or swap file resides. 
File Name Name of the page or swap file. For secondary page or swap files, the 


file name is obtained by a special AST to the job controller on the 
remote node. The Data Analyzer makes one attempt to retrieve the file 


name. 

Used Number of used blocks in the file. 

% Used Of the available blocks in each file, the percentage that has been used. 
Total Total number of blocks in the file. 

Reservable The number of reservable blocks in each page or swap file currently 


installed. Reservable blocks are blocks that might be logially claimed 
by a process for future physical allocation. A negative value indicates 
that the file might be overcommitted. Although a negative value is 
not an immediate concern, it indicates that the file might become 
overcommitted if physical memory becomes scarce. 


Notes 


OpenVMS Versions 7.3-1 and higher do not have a page or swap file 
“Reservable” field. The Data Analyzer displays N/A in the field for these 
versions of OpenVMS. 


If events for secondary page and swap files are signaled before the Data 
Analyzer has resolved their file names from the file ID (FID), events such 
as LOPGSP display the FID instead of file name information. You can 
determine the file name for the FID by checking the File Name field in 
the I/O Page Swap Files page. The FID for the file name is displayed 
after the file name. 


The status bar in the OpenVMS I/O Page/Swap Files pane (Figure 3-13) shows 
the total number of page and swap files on the node, the number that are listed, 
and the number that are filtered out. The status bar is updated with each 
data collection. The data collection rate is determined by the customization of 
page/swap data collection intervals. See Section 7.5 for instructions on how to 
change data collection intervals. 
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3.2.5 Disk Summaries 


The Disk tab on the Node Summary page (Figure 3—4) allows you to display disk 
pages that contain data about availability, count, and errors of disk devices on 
the system. OpenVMS disk data displays differ from those for Windows nodes, as 
described in the following sections. 


On OpenVMS pages, the View menu lets you choose the following disk 
summaries: 


e Status Summary 
e Volume Summary 
Also, on the Disk Status Summary, you can double-click a device name to display 


a Single Disk Summary page. 


3.2.5.1 OpenVMS Disk Status Summary 


To display the default disk page, the OpenVMS Disk Status Summary page 
(Figure 3-14), click the Disk tab on the OpenVMS Node Summary page 

(Figure 3-4). The Disk Status Summary page displays disk device data, including 
path, volume name, status, and mount, transaction, error, and resource wait 
counts. 


Figure 3-14 OpenVMS Disk Status Summary 


File View Fix Customize 


(Node Summary (CPU {Memory [W/O |(Disk{ Lock Contention {Cluster Summary | 


Device Name Host Path Volume Name Status Error _| Trans | Mount | RVVait 
$85$DKAZ00 |ANDAZA | $85$DKA200 | Mounted 4 1 1| aE 
$85$DKA300 |ANDAZA | §$85$DKA300 [Mounted | 4 1 1 Q 
$85$DKA400 |ANDAZA |$85$DKA400 |Mounted | 4/1) 1] 0 
$85$DKA800 |ANDAZA | §$85$DKA800 | Mounted 74 1| 1 Q 
$85$DKA900 |ANDAZA | §$85$DKA900 | Mounted 2 1 1| Q 
‘$86$DKAO0 |ANDAZA | $86$DKAD _| Mounted i[_1| 1| a 
$86$DKA1 ANDAZA | $86$DKA1 Mounted 1) 1| 1| i 
$86$DKA2  |ANDAZA | $86$DKAZ Mounted iff 1| 0 
$86$DKA3 | ANDAZA | $8 6$DKA3 Mounted O 1 1 Q 
$86$DKA4 — |ANDAZA | $86$DKA4_—| Mounted oO} 1| a 
$86$DKA5 | ANDAZA | $86$DKAS Mounted 1 1 1 Q 
$888$DKA200 |ANDA3A | $888$DKA200 | Mounted oO) 1) a 
‘DSAQ _|ANDAZA | DSA |Mounted | 0 i[ ij Ss 
DSA1 ANDAZA OCALA OLD Mounted 0 1| 1| 0 
DSA1999 ANDAZA |SPNKY_TST Mounted oO, | 1| a 
DSAZ ANDAZA COBRA3_S¥5 Mounted 0 1 1 0 
DSA3 ANDAZA | OCALA_NSYS | Mounted oO} 1| 0 
DSA333 [ANDAZA | DSA333 _ [Mounted | 0] 1| a) 
DSA4 ANDAZA DISK$REGRES Mounted 0 13 1| Ow) 


OpenVMS Alpha node ANDA2A - Disk Survey - Status: 315 disks (177 listed, 138 filtered ... 


Note 


Disk status data is accurate only if every node in an OpenVMS Cluster 
environment is in the same group. You might lose accuracy if you do not 
have all the nodes of a cluster in one group. 


To ensure that the disk status data is accurate for an OpenVMS Cluster, 
it is recommended that you enable background data collection for the disk 
status data. See Section 7.5 on how to do this. 
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This summary displays the following data: 


Heading Description 
Device Name Standard OpenVMS device name that indicates where the device is 
located, as well as a controller or unit designation. 
Host Path Primary path (node) from which the device receives commands. 
Volume Name Name of the mounted media. 
Status One or more of the following disk status values: 
Alloc Disk is allocated to a specific user. 
CluTran Disk status is uncertain because of a cluster 
state transition in progress. 
Dismount Disk in process of dismounting; may be 
waiting for a file to close. 
Foreign Disk is mounted with the /FOREIGN 
qualifier. 
Invalid Disk is in an invalid state (most likely Mount 
Verify Timeout). 
MntVerify Disk is waiting for a mount verification. 
Mounted Disk is logically mounted by a MOUNT 
command. 
Offline Disk is no longer physically mounted in 
device drive. 
Online Disk is physically mounted in device drive. 
Shadow Set Disk is a member of a shadow set. 
Member 
Unavailable Disk is set to unavailable. 
Wrong Volume Disk was mounted with the wrong volume 
name. 
Wrtlck Disk is mounted and write locked. 
Error Number of errors generated by the disk (a quick indicator of device 
problems). 
Trans Number of in-progress file system operations for the disk. 
Mount Number of nodes that have the specified disk mounted. (These 


nodes must have the Data Collector installed and running to be 
participate in the mount count.) 


Rwait Indicator that a system I/O operation is stalled, usually during 
normal recovery from a connection failure or during volume 
processing of host-based shadowing. 


The status bar in the OpenVMS Disk Status Summary (Figure 3-14) shows 

the total number of volumes on the node, the number that are listed, and the 
number that are filtered out. The status bar is updated with each data collection. 
The data collection rate is determined by the customization of disk status data 
collection intervals. See Section 7.5 for instructions on how to change data 
collection intervals. 


Getting Information About Nodes 3-23 


Getting Information About Nodes 
3.2 Node Data Pages 


3.2.5.2 OpenVMS Single Disk Summary 
To collect single disk data and display the data on the Single Disk Summary, 
double-click a device name on the Disk Status Summary. Figure 3-15 is an 
example of a Single Disk Summary page. The display interval of the data 
collected is 5 seconds. 


Note that you can sort the order in which data is displayed in the Single Disk 
Summary page by clicking a column header. To reverse the sort order of a column 


of data, 


click the column header again. 


Figure 3-15 OpenVMS Single Disk Summary 


Single Disk <KOINE3SDKA200 > 
File Customize Help 


Node 


AMDS5 
AMDS6 
AMDS? 
AMDS8 
AMI64 
KOINE 
KOINE3 


Status | Errors | Trans | RWait | Free OpRate | 
** no data ** = = = - 

Mounted 1035335 
Mounted 1035335 
Mounted 1035335 
Mounted 1035335 
Mounted 1035335 
Mounted 1035335 


o 
0 
o. 
Oo. 
0 
0 


Single Disk <KOINES$DKA200> 


This summary displays the following data: 


Data 


Description 


Node 
Status 
Errors 
Trans 


Rwait 
Free 
QLen 
OpRate 


Name of the node. 
Status of the disk: mounted, online, offline, and so on. 
Number of errors on the disk. 


Number of in-progress file system operations on the disk (number of open 
files on the volume). 


Indication of an I/O stalled on the disk. 
Number of free disk blocks on the volume. 
Average number of operations in the I/O queue for the volume. 


Each node’s contribution to the total operation rate (number of I/Os per 
second) for the disk. 
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3.2.5.3 OpenVMS Disk Volume Summary 


By using the View option on the Disk Status Summary page (Figure 3-14), you 
can select the Volume Summary option to display the OpenVMS Disk Volume 
Summary (Figure 3-16). This page displays disk volume data, including path, 
volume name, disk block utilization, queue length, and operation rate. 


Figure 3-16 OpenVMS Disk Volume Summary 


- alal 
File View Pik Customize 


Device Name) HostPath |Volume Na...| 


| Queue 


| OpRate [Physical Size] Volume Size |Yolume Limit 
$4$DU... |AFFHST | ALPHA... 0.00 | 0.06) 8378028 8378028) 8589312 
$4§DU... AFFHST | COMMONS | 0.00 0.00 8378028 | aaeese| 8589312 
$4$DU... AFFHST | KITS 0.00) 0.00 8378028 8378028 8589312 
$4$DU... AFFHST [QUORUM | 0.00) 0.14| 8378028| 8378028| 8589312 
$4$DU... AFFHSJ USER1 | QO o.00 8378028) 8378028) 8589312 
$4$DU... AFFHST | USERZ 7359615 ESS] 1018413 0 0 8378028) 8378028) 8589312 


OpenVMS Alpha node AFFS451 - Disk Survey - Volumes: 11 volumes (6 listed, 5 filtered out) 


Note 


Disk volume data is accurate only if every node in an OpenVMS Cluster 
environment is in the same group. You might lose accuracy if you do not 
have all the nodes of a cluster in one group. 


To ensure that the disk volume data is accurate for an OpenVMS Cluster, 
it is recommended that you enable background data collection for the disk 
volume data. See Section 7.5 on how to do this. 


The Disk Volume Summary page displays the data described in the following 
table. (The last two columns, Volume Size and Volume Limit, are displayed only 
on OpenVMS Version 7.3-2 and later systems.) 


Data Description 


Device Name Standard OpenVMS device name that indicates where the device is 


located, as well as a controller or unit designation. 


Host Path Primary path (node) from which the device receives commands. 


Volume Name 
Used 
% Used 


Free 
Queue 


OpRate 


Physical Size 


Name of the mounted media. 
Number of blocks on the volume that are in use. 


Percentage of the number of volume blocks in use in relation to the 
total volume blocks available. 


Number of blocks of volume space available for new data from the 
perspective of the node that is mounted. 


Average number of I/O operations pending for the volume (an 
indicator of performance; less than 1.00 is optimal). 


Operation rate for the most recent sampling interval. The rate 
measures the amount of activity on a volume. The optimal load is 
device specific. 


Total number of blocks on the current physical disk device. This is 
the "Total Blocks" field of the $;sHOW DEVICE/FULL display 
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Data Description 


Volume Size Current number of blocks available for file allocation. This is the 
"Logical Volume Size" field of the $SsHOW DEVICE/FULL display. 
(For more information, see $$ET VOLUME/SIZE.) This column is 
displayed only on OpenVMS Version 7.3-2 and later systems. 


Volume Limit Maximum number of blocks the volume can reach using Dynamic 
Volume Expansion. This is the "Expansion Size Limit" of $SHOW 
DEVICE/FULL display. (For more information, see $SET 
VOLUME/LIMIT.) This column is displayed only on OpenVMS 
Version 7.3-2 and later systems. 


If the Data Analyzer detects that a disk volume size has increased, an VLSZCH 
event is signalled: 


AFFS55 Volume size of device $8$DKA200 (OPAL-X9U6) has changed 


Node Device Volume 
name name name 


The status bar in the OpenVMS Disk Volume Summary (Figure 3-16) shows 

the total number of volumes on the node, the number that are listed, and the 
number that are filtered out. The status bar is updated with each data collection. 
The data collection rate is determined by the customization of disk volume data 
collection intervals. See Section 7.5 for instructions on how to change data 
collection intervals. 


3.2.5.4 Windows Logical and Physical Disk Summaries 
On Windows nodes, the View menu lets you choose the following summaries: 


e Logical Disk Summary 
e Physical Disk Summary 


Windows Logical Disk Summary 


A logical disk is the user-definable set of partitions under a drive letter. The 
Windows Logical Disk Summary displays logical disk device data, including path, 
label, percentage used, free space, and queue statistics. 


To display the Logical Disk Summary page, follow these steps: 


1. Double-click a node name in the Node pane to display the Windows Node 
Summary. 


2. Click the Disk tab on the Windows Node Summary. 


The Data Analyzer displays the Windows Logical Disk Summary page 
(Figure 3-17). 
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Figure 3-17 Windows Logical Disk Summary 


\)Node AFFC36 


File View Customize 


(Node Summary {CPU {Memory | Disk} 


Windows NT Intel node AFFC36 - Logical Disk Summary 


This summary displays the following data: 


Data Description 

Disk Drive letter, for example, c:, or Total, which is the summation of 
statistics for all the disks. 

Path Primary path (node) from which the device receives commands. 

Label Identifying label of a volume. 

Type File system type; for example, FAT or NTFS. 

% Used Percentage of disk space used. 

Free Amount of free space available on the logical disk unit. 


Current Queue 


Average Queue 


Transfers/Sec 
KBytes/Sec 


% Busy 


Number of requests outstanding on the disk at the time the 
performance data is collected. It includes requests in progress 
at the time of data collection. 


Average number of both read and write requests that were queued 
for the selected disk during the sample interval. 


Rate of read and write operations on the disk. 


Rate data is transferred to or from the disk during write or read 
operations. The rate is displayed in kilobytes per second. 


Percentage of elapsed time that the selected disk drive is busy 
servicing read and write requests. 


Windows Physical Disk Summary 

A physical disk is hardware used on your computer system. The Windows 
Physical Disk Summary displays disk volume data, including path, label, queue 
statistics, transfers, and bytes per second. 


To display the Windows Physical Disk Summary, follow these steps: 


1. Click the View menu on the Windows Logical Disk Summary. 


2. Click the Physical Disk Summary menu option. 


The Data Analyzer displays the Windows Physical Disk Summary page 


(Figure 3-18). 
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Figure 3-18 Windows Physical Disk Summary 


[i Node AFFC53 


File View Fix Customize Helt 


0 AFFC53 
Total) AFFC53_ 


Windows NT Intel node AFFCS53 - Physical Disk Summary 


This page displays the following data: 


Data Description 

Disk Drive number, for example, 0, 1, 2 or Total, which is the summation 
of statistics for all the disks. 

Path Primary path (node) from which the device receives commands. 


Current Queue 


Average Queue 
Transfers/Sec 
KBytes/Sec 

% Busy 

% Read Busy 


% Write Busy 


Number of requests outstanding on the disk at the time the 
performance data is collected; it includes requests in service at 
the time of data collection. 


Average number of read and write requests that were queued for 
the selected disk during the sample interval. 


Rate of read and write operations on the disk. The rate is displayed 
in kilobytes per second. 


Rate bytes are transferred to or from the disk during read or write 
operations. The rate is displayed in kilobytes per second. 


Percentage of elapsed time the selected disk drive is busy servicing 
read and write requests. 


Percentage of elapsed time the selected disk drive is busy servicing 
read requests. 


Percentage of elapsed time the selected disk drive is busy servicing 
write requests. 


3.2.6 OpenVMS Lock Contention 


To display the OpenVMS Lock Contention page, click the Lock Contention tab on 
the OpenVMS Node Summary page (Figure 3—4). For all the nodes in the group 
you have selected, the Lock Contention page displays each resource for which a 
lock contention problem might exist. 


Note 


Lock contention data is accurate only if every node in an OpenVMS 
Cluster environment is in the same group. You might lose accuracy if you 
do not have all the nodes of a cluster in one group. 


To ensure that the lock contention data is accurate for an OpenVMS 
Cluster, it is recommended that you enable background data collection for 
the lock contention data. See Section 7.5 on how to do this. 
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3.2.6.1 Lock Contention Page in Decoded Format 


Figure 3-19 shows a sample Lock Contention page containing resource names in 
decoded format, which is the default. 


Figure 3-19 OpenVMS Lock Contention (Decoded Format) 


[yJNode AFFS11 __ F 


File View Fix Customize 


(Node Summary [CPU [Memory [1/0 | Disk (Lock Contention 


© P REGSHASTER_LOCK 
® P RECSHASTER_LOCK 
® / RECsHASTER_LOCK 


le] Resource Name Master Node Parent Resource Name Duration Gr/Cv/Ut/St Status 
o 7 APACHESACCEPT_0000023a AFFSS 0 01:21:23 <s9s--- pel 
o 7” APACHE$ PARENT_0000023D AFFS8 0 01:21:23 DIREN 
o rad PWRK$ LMSRV_ALIVE_ZO6O000BA..-...- AFFSSZ O 01:21:22 DIRENT 
o 7° WANS IBC_ALIVE_O1 AFFSS1 Q Mgr file for $4#DUA310(6498,2,0) O 01:21:24 VALID 
© / QuAN? JBC_ALIVE_01 AFFS6 Q Mgr file for AFFS6¢DKA400(6796,3,0) 0 01:21:22 VALID 
o 7° WANS JBC_ALIVE_01 AFFS? Q Mgr file for AFFS7$DKA100(6430,4,0) @ G1:21:22 VALID 
o 7° WANS IBC_ALIVE_01 AFFSSZ Q Mgr file for $4#DUA130(466,1,0) 0 01:21:22 VALID 
© / QMANs JBC_ALIVE_01 AFFSS Q Mgr file for #Z2¢DIAZ2(3132,5,0) 0 01:21:22 VALID 
o 7° WANS JBC_ALIVE_01 AFFS23 Q Mgr file for AFFS23$DKD30(6648,7,0) 0 01:21:28 VALID 
© / QMANs JBC_ALIVE_01 AFFS1Z Q Mgr file for AFFS12#DKAZ0(6194,4,0) O 01:21:22 VALID 
o Fag QMAN¢ JBC_ALIVE_0O1 AFFS8 Q Mgr file for #8#DKA0(5274,30,0) 0 01:21:23 VALID 

o 3 

oO 

is) 


AFFS1Z 
AFFS8 
AFFS? 


RSB 
Node 
Parent 
Duration 
Status 
VaIBIk 
dump 


REG$MASTER_LOCK 


FFFFFFFF7FECQASO 
AFFST 


0.01:21:22 

DIRENTRY VALID 

0000: 53464641 00000117 ...AFFS 
0008: 00000001 00000037 7....... 


OpenVMS Alpha node AFFS11 - Lock Contention 


(You can display a tooltip similar to the one shown in Figure 3-19 by holding the 
cursor on a resource line. See the Note in the introduction to this chapter for 
further details.) 


By selecting the View menu (on the Lock Contention page), followed by the 
Resource names menu item, you can choose to display the resource name and 
parent resource name in either of two formats: 


e Raw format (the format that SDA uses) 
e Decoded format (the default format) 


Figure 3-19 displays the resource names in decoded format. (The Data Analyzer 
decodes common resource names.) 


The Lock Contention page displays the data described in Table 3-8. Numbered 
lines correspond to lines or items of data in the Lock Contention Log 
(Example 3-1). 


Table 3-8 Data on the OpenVMS Lock Contention Page 


Lock 

Log 

Reference 

Number Data Description 

1 Resource Name _ Resource name associated with the $ENQ system service 


call. 


(continued on next page) 
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Table 3-8 (Cont.) Data on the OpenVMS Lock Contention Page 


Lock 

Log 

Reference 
Number Data 


Description 


Master Node 


Parent Resource 


4 Duration 
5 Gr/Cv/Wt/St 
6 Status 


Node on which the resource is mastered. 


Name of the parent resource. No name is displayed when a 
parent resource does not exist. 


Time elapsed since the Data Analyzer first detected the 
contention situation. 


Total number of locks in each of four states. Numbers for 
these states appear only when you are collecting lock data. 
The states are: 


e Granted 

e Converting 
e §©6Waiting 

e =6Stalled 


Stalled indicates one of several states whenever a lock is 
waiting for a response from another node in the cluster. 


Status of the lock. See the $ENQW description of flags in 
the HP OpenVMS System Services Reference Manual. 


The tooltip that is displayed when you hold the cursor over a line of data in 
Figure 3-19 contains the data described in Table 3-8, as well as the information 


described in Table 3-9. 


Table 3-9 Lock Contention Tooltip Data 


Reference 

Number Data Description 

7 RSB Address of the Resource Block 

8 ValBlk dump Resource Value Block dump in standard OpenVMS dump 


format 


3.2.6.2 Lock Contention Page in Raw Format 


Figure 3-20 shows the Lock Contention page with resource name data displayed 
in raw format. It also shows the tooltip that is displayed when you hold the 


cursor over a line of data. 
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Figure 3-20 OpenVMS Lock Contention (Raw Format) 


(i) Node AFFS11 


File View Fix Customize 


Help 


(Node Summary [CPU [Memory [1/0 [Disk [Lock Contention 


Resource Name 
@® / aAPACHESACCEPT_o0000234 
©® / APACHE PARENT_00000Z3D 
© /P PuPKsLNSRV_ALIVE_ZOGO00BA...... 
© P Quans JBC_ALIVE_01 
® P QMANs JBC_ALIVE_01 
© /P QmaNsJBC_ALIVE_O1 
© P ouansJBC_ALIVE_01 
© / QMANs JBC_ALIVE_01 
© > QANs JBC_ALIVE_01 
© / QUAN? JBC_ALIVE_01 
© / QMANs JBC_ALIVE_01 
© P REGSHASTER_LOCK 
© / RECSMASTER_LOCK 
® / RECsHASTER_LOCK 


Master Node Parent Resource Name 
AFFS8 

AFFSS 

AFFS5Z 

AFFSS1 QMANSMSR_#4¢DUA310.... 
AFFS6 QMANSMSR_AFFS6¢DKA400. 
AFFS? QMANSMSR_AFFS7$DKALOO. 
AFFSSZ QMANSMSR_#4¢DUAL30.. 
AFFSS QMANSMSR_¢Z¢DIAZ...- 
AFFS23 QMANSMSR_AFFSZ3¢DKD300. . 
AFFS1Z QMAN¢MSR_AFFS12¢DKAZ00 
AFFS8 QMANSMSR_$8¢DKAO 
AFFS12 

AFFS8 

AFFS? 


RSB 
Node 
Parent 
Duration 
Status 
ValBlk 
dump 


REG$MASTER_LOCK 


FFFFFFFF7FEC9ASD 
AFFST 


00:24:24 

DIRENTRY VALID 
0000: 53464641 00000117 ...AFFS 
0008: 00000001 o0000037 7 


Duration Gr/Cv/Wt/St Statu: 
0 01:24:24 DIRENT| 
0 01:24:24 DIRENT 
0 01:24:23 DIRENT| 
0 01:24:24 VALID 
0 01:24:24 VALID 
0 01:24:24 VALID 
0 01:24:23 VALID 
0 01:24:23 VALID 
0 01:24:29 VALID 
0 01:24:23 VALID 
O 01:24:24 VALID 
0 01:24:24 DIRENT 
0 01:24:24 DIRENT 
0 01:24:24 DIRENT 


OpenVMS Alpha node AFFS11 - Lock Contention 


In Figure 3-20, notice that a period is substituted for each unprintable character 


in the Resource Name and Parent Resource Name fields. 


3.2.6.3 Lock Block Data 


When you click the handle that precedes any line of resource data, the Data 


Analyzer displays the lock block data that is shown in Figure 3-21 and 


Figure 3-22. 


Figure 3-21 OpenVMS Lock Block Data 


Node ANDAIA 
File View Fix Customize 


(Node Summary {CPU {Memory [(/0 {Disk Lock Contention [Cluster Summary | 


fA Resource Name 
® P 10cEns_z 
Q > locENs_22z2z2z 


fy] Node State Process Name 
(gl SPNKYZ Granted System Lock 
fai SPNKY3 Waiting System Lock 


© P 10cENs_32767 

® P 10cEns_444 

© / 1ocENs_600 

© P LOcENs_742 

® P rocuns_s4z 

® P 1OcENs_85 

© P 10cEN?_86 

© P 10cENs_ess 

@ P quanssBC_ALIVE_o1 


Node State Process Name 
GUANG Granted JOB_CONTROL 
GUANG Waiting QUEUE_MANAGER 
@ / wRITER 

fF] Node State Process Name 
ANDASA Granted AUDIT_SERVER 
ADEBUG Convert AUDIT_SERVER 
ANDAZA Convert: AUDIT_SERVER 
SGRPOP Convert AUDIT_SERVER 
SPNKY Convert: AUDIT_SERVER 
EBIBOS Convert: AUDIT_SERVER 
FLANS? Convert AUDIT_SERVER 
OCALA Convert AUDIT_SERVER 


Master Node Parent Resource Name 

WILD4 

SPNKYZ 
LKID Mode Duration 
oz0003DA EX O 01:19:29 
Oz0003E8 EX 0 01:19:29 

EBJBO3 

ANDALA 

SABLZ 

SGRPOP 

SGRPOP 

ANDALA 

ANDALA 

ANDASA 

WILD4 Q Mgr file for DSA4(533,4619Z,0) 
LKID Mode Duration 
1A007414 EX 0 00:07:51 
S5000546D cR 0 00:07:51 

WILD3 Audit Srv Jrnl DISK#REGRES(6748,25,0) 
LKID Mode Duration 
1900F703 Pw O 01:13:18 
O600FSA6 NL/PW O 01:13:18 
1000F47B NL/PW O 01:13:18 
O?7O0E1SF NL/PW O 01:13:18 
O900F47Z NL/PW O 01:13:18 
3000F36E NL/PW O 01:13:18 
OBOOEFIC NL/PW O 01:13:18 
1Z00F39D NL/PW Oo o1 8 


Duration Gr/Cv/Wt/St Status 

O 22:34:50 = = VALID |* 

O 22:33:50 1/o/1/0 VALID 
Flags 


SYSTEM NOQUOTA CVTSYS NODLCKW NODLC| 
SYSTEM NOQUOTA CVTSYS NODLCKW NODLC| 


O 22:33:49 re = VALID 
0 22:35:01 VALID 
0 22:34:50 VALID 
O 22:34:50 VALID 
0 22:34:50 VALID 
0 22:35:01 VALID 
O 22:35:01 VALID 
O 22:34:50 bal = DIREN' 
0 00:46:09 1sos1so VALID 

Flags 

NOQUEUE 

NODLCKW 
0 01:13:38 3/15/0/0 VALID 

Flags 

CONVERT NODLCKW NODLCKBLK 

CONVERT NODLCKW NODLCKBLK 

CONVERT NODLCKW NODLCKBLK 

CONVERT NODLCKW NODLCKBLK 

CONVERT NODLCKW NODLCKBLK 

CONVERT NODLCKW NODLCKBLK 

CONVERT NODLCKW NODLCKBLK 

VALBLK CONVERT NODLCKW NODLCKBLK hd 
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Figure 3-22 OpenVMS Lock Block Data (Retry Stalled State) 


[i] Node KOINE 
File View Fix Customize 


(Node Summary (CPU {Memory {W/O [Disk | Lock Contention {Cluster Summary | 


fea] Resource Name 

@ /* DTISSYSTEM$KOINES 
Node State 
KOINES Retry 

@ /* QUANS JBC_ALIVE_01 
Node State 
KOINE Granted 


KOINE Waiting 


Process Name 
TP_SERVER 


Process Name 
JOB_CONTROL 
QUEUE_MANAGER 


Master Node Parent Resource Name 
KOINE3 0 00:09:23 


Duration 


LKID Mode Duration Flags 


3600087B NL/EX 0 00:08:13 SYSTE: 
Q Mgr file for #1#DKA300(2054,2,0) O 00:11:27 
LKID Mode Duration Flags 
1EOQ0034A EX 0 00:11:13 NOQUE 
01000358 cR 0 00:11:13 NODLCH 


OpenVMS Alpha node KOINE - Lock Contention 


The lock block data in these two figures includes additional lock information 
under the headings shown in Table 3-10. Numbered lines correspond to lines or 
items of data in the Lock Contention Log (Example 3-1). 


Table 3-10 Lock Block Data 


Description 


Reference 

Number Data 

9 Node 

10 State 

11 Process 
Name 

12 LKID 
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Node name on which the lock is granted. 


One of the following: 


Color Meaning 

Green Granted 

Yellow Converting 

Pink Waiting 

Pale grey Stalled states that are visible: 


SCSWAIT: A transient state indicating that 
a lock message has been sent to the node 
with the master lock and a response is 
awaited. 


RETRY: A transient state seen only under 
error conditions that require that a lock 
message be resent. This can occur if the 
node to which a lock message was sent goes 
down before a response from it is received 
or if resources for sending a message cannot 
be allocated. 


Name of the process that owns the blocking lock. 
Lock ID value (which is useful with SDA). 


(continued on next page) 
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Table 3-10 (Cont.) Lock Block Data 


Reference 

Number Data Description 

13 Mode One of the following modes in which the lock is granted or 
requested:! 

CR Concurrent read Grants read access and allows 
resource sharing with other 
readers and writers. 

CW Concurrent Grants write access and allows 

write resource sharing with other 
groups. 

EX Exclusive Grants write access and prevents 
resource sharing with any other 
readers or writers. 

NL Null Grants no access; used as 
an indicator of interest or a 
placeholder for future lock 
conversion. 

PR Protected read Grants read access and allows 
resource sharing with other 
readers, but not writers. 

PW Protected write Grants write access and prevents 
resource sharing with any other 
readers or writers. 

If one mode is displayed, it is the Granted mode; if two modes 

are displayed, the first is the Granted mode and the second is 

the Converting mode. 

14 Duration Length of time the lock has been in the current queue since the 
console application found the lock. 

15 Flags Flags specified with the $ENQW request. See the $ENQW 


entry in HP OpenVMS System Services Reference Manual. 


1Descriptions are from Goldenberg, Ruth, and Saravanan, Saro, OpenVMS AXP Internals and Data 
Structures, Version 1.5, Digital Press, 1994. 


To interpret the information displayed on the OpenVMS Lock Contention 
page, you need to understand OpenVMS lock management services. For more 
information, see the HP OpenVMS System Services Reference Manual. 


3.2.6.4 Lock Block Log File 


Example 3-1 contains an excerpt of a lock block log file. You can find a lock block 
log file in either of the following locations: 


System File Name Location 

Windows AvailManLock.log Installation directory 

OpenVMS AvailManLock.log, Directory to which AMDS$AM_LOG logical 
prefaced by points 


AMDS$AM_LOG 


Numbers preceding lines or items of data in Example 3-1 correspond to numbered 
lines in Table 3-8, Table 3-9, and Section 3.2.6.3. Table 3-11 contains lines or 
items of data in a lock block log file that are not described in the other tables in 
this section. 
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Table 3-11 Additional Data in the Lock Block Log File 


Lock 
Log 
Reference 
Number Data from Example Description 
16 Reason for logging In the example, the reason for logging is "the number 
of locks has changed." Other reasons include the "initial 
discovery of resource contention" or "lock data collection 
has been turned on." 
17 GGMODE/CGMODE Lock has been Granted/Lock is Converting. 
18 Resource Name OpenVMS style of Resource Name dump. 
Dump 
19 RDB global Decoded Resource Name. 
database name 
resource 
20 Parent Resource OpenVMS style of Parent Resource Name dump. 
Name Dump 
21 RDB global Decoded Parent Resource Name. 
database name 
resource 
22 Lock data is being The handle preceding a line of lock data has been turned. 
collected 
23 Master copy info. Remote node that contains the master copy of the lock. 
Remote Node If “Local Copy,” only one node is interested in the lock. 
24 Master copy info. Lock ID of remote node that contains the master copy of 


Remote Lock ID 


the lock. 


Example 3-1 Lock Block Log File 


KKK KER KK KEK KKK RK KEKE KEK KR KKK KKK KKK KKK KKK KEKE KKKKKKKKKK 


11-Nov-2003 14:54:13.656 


Time: 


16)Reason for logging: 


Number of locks has changed 


2) Master Lock Node: ALTOS 

1) Resource Name: Tats. sca 

17) GGMODE/CGMODE: EX/EX 

6 Status: VALID 

7 RSB Address: FFFFFFFE. 889F1580 

18) Resource Name Dump (includes initial count byte): 
0000: 000200 00004906 .T..... 

8 Value Block Dump: 
0000: 00000000 00000000 ........ 
0008: 00000000 00000000 ........ 


19) Rdb Remote monitor resource 


3) Parent 
7) RSB 
20) 


0000: 
0008: 


0018: 


Resource Name: 
Address: 
Resource Name Dump (includes initial count byte): 


2 


FFFFFFFE.8847DB80 


00004400 0000DD1C 
4F4F5245 44560200 


itieltais Dies 
. .VDEROO 


0010: A0002020 20202054 T 


00 00000237 7.... 
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Y...D....VDEROOT Raed 


(continued on next page) 


Example 3-1 (Cont.) Lock Block Log File 


8) Value Block Dump: 
0000: 00000000 00000000 
0008: 00000000 00000000 


21) Rdb global database name resource 
Disk volume name: VDEROOT 


FID for file: (14240,2,0) 
22) Lock data is being collected 
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5) Granted lock count: ul 
5) Conversion lock count: 0 
5) Waiting lock count: 4 
5) Stalled lock count: 0 
10) 9) 11) 12) 13) Master copy info: 15) 
Lock Node Process Process Lock Gr/Cv Remote Remote Flags 
State PID Name ID Mode Node Lock ID 
23) 24) 
Granted ALTOS 28E00441 RDMS_MONITOR70 04014B37 EX (Local copy) NQUE SYNC SYS 
Waiting ALTOS 2880023F RDMS MONITOR70 4C0065B5 PR TSAVO 32005001 SYNC SYS NDLW 
Waiting ALTOS 00000000 (EPID=28A0023D) 4C0144C4 PR ETOSHA 74005E36 SYNC SYS NDLW 
Waiting ALTOS 28C00448 RDMS MONITOR70 1D0144A3 PR CHOBE 77005906 SYNC SYS NDLW 
Waiting ALTOS 28E026C3 VDESKEPT126A3 01014B2D PR (Local copy) SYS NDLW 


KKK KKK KEKE KKK KE RK KERR KK KKK KR KKK KKK KEK KKK KKK KEKKKKKKKKKK 


3.3 OpenVMS Single Process Data 


When you double-click a row in the lower part of an OpenVMS Mode Details 
(Figure 3-7), OpenVMS CPU Process Summary (Figure 3-8), Memory 
(Figure 3-10), or I/O (Figure 3-12) pages, the Data Analyzer displays the 
first of several OpenVMS Single Process pages. 


Alternatively, you can right-click a row and select “Display...”. The View menu 
item contains three display options, shown in Figure 3-23. 


Figure 3-23 Single Process Window 


PESOS Single Process <_RTA2:> (DETACHED) 


File [View| Fix Customize 


all Ee Tabs 
| 
Oni ag Vertical Grid 
‘a= 
"7 = 
Of — | Horizontal Grid RTAZ: 
a ee HILDE 
Account: DEBUG 
uIc: [14,253] 
| PID: 23E00135 
Owner ID: ooooo000 
Pc: Not Available 
Ps: Not Available 
| Priority: 6/4 
State: HIB 
CPU Time: 0 00:23:22.55 


Current image: $1$DGA3890:(SYSO.SYSCOMMON JAVA$ 142. BINJAVASIAVA.... 
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Explanations of the choices in the View menu are the following: 
e Tabs: individual tabs for each Single Process display: 

— Process Information 

— Working Set 

— Execution Rates 

— Process Quotas 

— Wait States 

— Job Quotas 

— RAD Counters 


e Vertical Grid: all of the Single Process displays combined in one vertically- 
oriented grid 


e Horizontal Grid: all of the Single Process displays combined in one 
horizontally-oriented grid 


The following sections describe the individual tabs or sections of the vertical or 
horizontal grids. 


Each section refers to the vertical grid display shown in Figure 3-24. The status 
bar displays the current image that the process is running. 
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Figure 3-24 Single Process Vertical Grid Display 


PESOS Single Process <_RTA4:> (DETACHED) 


File View Fix Customize 


=18) x! 


CPU: 22a 
Direct V0: 41. 
Buffered I/O: eile 


Paging W/O: 128. 
Page Faults: 406. 


Process Quotas 


Process Information 
Process Name: RTA4: 
Username: HILDE 
Account: DEBUG 
UIC: [14,253] 
PID: 23E00138 
Owner ID: oooo0000 
PC: Not Available 
Ps: Not Available 
Priority: 4/4 
State: HIB 
CPU Time: O 00:22:41.04 
Working Set 
WS Global Pages: 896 
WS Private Pages: 37920 
WS Total Pages: 38816 
WS Size: 52384 
WS Default: 16384 
WS Quota: 32768 
WS Extent: 522240 
Images Activated: 1235 
Mutexes Held: 0 
Wait States 
Current 
Compute: 14 
Memory: 0 
Direct 1/0: 0 
Buffered W/O: o 
Control: 2 
Quotas: o 
Explicit: 71 


Current Limit 
Direct VO: 18) 150 
Buffered I/O: 0 150 
ASTs: 8 300 
CPU Time: O 00:22:41.04 No Limit 
Job Quotas 
Current Limit 
Open file count: 14 250 
Paging file count: 119648 750000 
Enqueue count: 3 2000 
TQE count: 4 20 
Subprocess count: oa 60 
Byte count: 3008 128000 
RAD Counters 
Current Total 
Home RAD = 0 

Private 2746 2746 

Shared a 0 

Global 54 54 


Current image: $1$DGA3890:[SYSO.SYSCOMMON JAVA$ 142 BINJAVAZJAVA. EXE; 1 


3.3.1 Process Information 


Table 3-12 describes the Process Information data shown in Figure 3-24. 


The data on this page is displayed at the default intervals shown for Single 
Process Data on the Data Collection Customization page. 


Table 3-12 Process Information 


Data Description 


Process name 


Name of the process. 


Username User name of the user who owns the process. 

Account Account string that the system manager assigns to the user. 

UIC User identification code (UIC). A pair of numbers or character 
strings that designate the group and user. 

PID Process identifier. A 32-bit value that uniquely identifies a process. 

Owner ID Process identifier of the process that created the process displayed 


on the page. If the PID is 0, then the process is a parent process. 


(continued on next page) 
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Table 3-12 (Cont.) Process Information 


Data Description 

PC Program counter. 
On OpenVMS Alpha systems, this value is displayed as 0 because 
the data is not readily available to the Data Collector node. 

PS Processor status longword (PSL). This value is displayed on VAX 
systems only. 

Priority Computable and base priority of the process. Priority is an integer 
between 0 and 31. Processes with higher priority are given more 
CPU time. 

State One of the process states listed in Appendix A. 

CPU Time CPU time used by the process. 


3.3.2 Working Set 


Table 3-13 describes the Working Set data shown in Figure 3-24. 


Table 3-13 Working Set 


Data 


Description 


WS Global Pages 
WS Private Pages 


WS Total Pages 
WS Size 


WS Default 


WS Quota 


WS Extent 
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Shared data or code between processes, listed in pages (measured in 
pagelets). 


Amount of accessible memory, listed in pages (measured in 
pagelets). 


Sum of global and private pages (measured in pagelets). 


Working set size. The number of pages (measured in pagelets) of 
memory the process is allowed to use. This value is periodically 
adjusted by the operating system based on analysis of page faults 
relative to CPU time used. Increases in large units indicates that 
a process is taking many page faults, and its memory allocation is 
increasing. 


Working set default. The initial limit of the number of physical 
pages (measured in pagelets) of memory the process can use. 
This parameter is listed in the user authorization file (UAF); 
discrepancies between the UAF value and the displayed value 
are due to page/longword boundary rounding or other adjustments 
made by the operating system. 


Working set quota. The maximum amount of physical pages 
(measured in pagelets) of memory the process can lock into its 
working set. This parameter is listed in the UAF; discrepancies 
between the UAF value and the displayed value are due to 
page/longword boundary rounding or other adjustments made 
by the operating system. 


Working set extent. The maximum number of physical pages 
(measured in pagelets) of memory the system will allocate for 

the process. The system provides memory to a process beyond 

its quota only when it has an excess of free pages and can be 
recalled if necessary. This parameter is listed in the UAF; any 
discrepancies between the UAF value and the displayed value are 
due to page/longword boundary rounding or other adjustments made 
by the operating system. 


(continued on next page) 
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Table 3-13 (Cont.) Working Set 


Data Description 
Images Activated Number of times an image is activated. 
Mutexes Held Number of mutual exclusions (mutexes) held. Persistent values 


other than zero (0) require analysis. A mutex is similar to a lock 
but is restricted to one CPU. When a process holds a mutex, its 
priority is temporarily increased to 16. 


3.3.3 Execution Rates 
Table 3-14 describes the Execution Rates data shown in Figure 3-24. 


Table 3-14 Execution Rates 


Data Description 


CPU Percent of CPU time used by this process. The ratio of CPU time to 
elapsed time. 


Direct I/O Rate at which I/O transfers take place from the pages or pagelets 
containing the process buffer that the system locks in physical memory 
to the system devices. 


Buffered I/O Rate at which I/O transfers take place for the process buffer from an 
intermediate buffer from the system buffer pool. 


Paging I/O Rate of read attempts necessary to satisfy page faults. This is also 
known as page read J/O or the hard fault rate. 


Page Faults Page faults per second for the process. 


3.3.4 Process Quotas 
Table 3-15 describes the Process Quotas data shown in Figure 3-24. 


Note that when you display the SWAPPER process, no values are listed in this 
section. The SWAPPER process does not have quotas defined in the same way as 
other system and user processes do. 


Table 3-15 Quotas 


Data Description 

Direct I/O The current number of direct I/Os used compared with the limit 
possible. 

Buffered I/O The current number of buffered I/Os used compared with the possible 
limit. 

ASTs Asynchronous system traps. The current number of ASTs used 


compared with the possible limit. 


CPU Time Amount of time used compared with the possible limit. "No Limit" is 
displayed if the limit is zero. 


3.3.5 Wait States 
Table 3-16 describes the Wait States data shown in Figure 3-24. 


In the graph, “Current” refers to the percentage of elapsed time each process 
spends in one of the computed wait states. If a process spends all its time waiting 
in one state, the total gradually reaches 100%. 


Getting Information About Nodes 3-39 


Getting Information About Nodes 
3.3 OpenVMS Single Process Data 


How Wait States are Calculated 

The wait state specifies why a process cannot execute, based on calculations 
made on collected data. Each value is calculated over an entire data collection 
period of approximately 2 minutes. The graph shows, over this period of time, 
the percentage of time a process spends in each wait state. Each value is an 
exponential average that approximates a moving average. A more detailed 
explanation follows. 


When monitoring of a single process starts, all wait state values are zero. When 
the system periodically checks the process, the system first subtracts 10% from 

each value. It then adds a value of 10 to the wait state the process is currently 

in, if any. 


For example, at the start, if a process is found to be in the Control wait state, the 
graph immediately registers 10 for Control. If the process is still in the Control 
wait state the next time it is checked, the graph shows Control at 19. This value 
is 90% of the original 10 (or 9), plus 10 (the value currently being added). 


The next time the process is checked, if it is found to be in the Buffered I/O wait 
state, Buffered I/O is set to 10 and Control is set to 17 (approximately 90% of the 
previous value of 19). 


The following time the process is checked, if it is not in a wait state at all, 
Buffered I/O is set to 9 (90% of 10), and Control is set to 15 (90% of 17). 


Appendix A contains descriptions of wait states. 


Table 3-16 Wait States 


Data Description 

Compute Average percentage of time that the process is waiting for CPU time. 
Possible states are COM, COMO, or RWCAP. 

Memory Average percentage of time that the process is waiting for a page fault 


that requires data to be read from disk; this is common during image 
activation. Possible states are PFW, MWAIT, COLPG, FPG, RWPAG, 
RWNPG, RWMPE, or RWMPB. 


Direct I/O Average percentage of time that the process waits for data to be read 
from or written to a disk or tape. The possible state is DIO. 


Buffered I/O Average percentage of time that the process waits for data to be read 
from or written to a slower device such as a terminal, line printer, 
mailbox, or network traffic. The possible state is BIO. 


Control Average percentage of time that the process is waiting for another 
process to release control of some resource. Possible states are CEF, 
MWAIT, LEF, LEFO, RWAST, RWMBX, RWSCS, RWCLU, RWCSV, 
RWUNK, or LEF waiting for an ENQ. 


Quotas Average percentage of time that the process is waiting because the 
process has exceeded some quota. Possible states are QUOTA or 
RWAST_QUOTA. 


Explicit Average percentage of time that the process is waiting because the 
process asked to wait, such as a hibernate system service. Possible 
states are HIB, HIBO, SUSP, SUSPO, or LEF waiting for a TQE. 
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Table 3-17 describes the Job Quota data shown in Figure 3-24. 


Table 3-17 Job Quotas 


AUTHORIZE 

Data Description Quota 

Open File Count Current number of open files compared with the FILLM 
possible limit. 

Paging File Count Current number of disk blocks in the page file PGFLQUOTA 
that the process can use compared with the 
possible limit. Note that this value is in pagelets 
(512 byte pages) for compatibility and consistency 
with VAX systems. 

Enqueue Count Current number of resources (lock blocks) queued ENQLM 
compared with the possible limit. 

TQE Count Current number of timer queue entry (TQE) TQELM 
requests compared with the possible limit. 

Subprocess Count Current number of subprocesses created PRCLM 
compared with the possible limit. 

Byte Count Current number of bytes used for buffered I/O BYTLM 


transfers compared with the possible limit. 


3.3.7 RAD Counters 


Table 3-18 describes the RAD Counters data shown in Figure 3-24. The RAD 
(Resource Affinity Domain) Counters data page is displayed for Alpha and 164 


systems. 


Table 3-18 RAD Counters Data 


Data Description 

Private Number of process private pages on RAD 0. 
Shared Number of process shared pages on RAD 0. 
Global Number of global pages on RAD 0. 
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Displaying OpenVMS Cluster Data 


The Availability Manager Data Analyzer displays a great deal of OpenVMS 
Cluster data. The amount of cluster information has increased in each successive 
version of the product and will probably continue to do so. To accommodate 
future growth in this area, beginning with Availability Manager Version 2.2, 
OpenVMS Cluster information is documented in a separate chapter of this 
manual. 


By clicking a series of “handles” on the cluster node tree in the Cluster Members 
pane of the Cluster Summary page (Figure 4—1), you can open lines of data to 
display progressively more detailed cluster data. This chapter describes the data 
you can display. 


Support for Managed Objects 


New support has been added to the OpenVMS Data Collector, RMDRIVER, for 
OpenVMS managed objects, which are operating system components with 
characteristics that allow the Availability Manager to manage them. Managed 
objects, which register themselves with the Data Collector at system startup, not 
only provide data but also implement fixes in response to client requests. 


In OpenVMS Version 7.3 and later versions, cluster data and fixes are available 
for LAN virtual circuits through the managed object interface. When the Data 
Analyzer connects to a Data Collector node, it retrieves a list of the managed 
objects on that node, if any. For such a node, the Data Analyzer can provide 
additional details and any new data that would otherwise be unavailable. 


Note 


To enable managed object data collection on nodes running OpenVMS 
Version 7.3 and later, the system manager must take steps so that the 
Data Collector driver, RMDRIVER, is loaded early in the boot process. 
For more details on how to enable collection of managed object data, see 
the HP Availability Manager Installation Instructions. 


LAN Displays 


When you monitor OpenVMS Version 7.3 and later nodes with managed objects 
enabled, additional cluster data and fixes are available for LAN virtual circuits. 
This data includes enhanced LAN virtual circuit summary data in the Cluster 
Summary window and the LAN Virtual Circuit Details (NISCA) window. In 
addition, the Cluster Summary includes virtual circuit, channel, and device fixes. 
If managed object support is not enabled for a Data Collector node, then only 
basic virtual circuit data is available. 
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4.1 OpenVMS Cluster Summary Page 


To display the OpenVMS Cluster Summary page (Figure 4—1), click the Cluster 
Summary tab on an OpenVMS Node Summary page (Figure 1-7). 


The Cluster Summary page contains cluster interconnect information for an 
entire cluster as well as detailed information about each node in the cluster, 
including System Communications Services (SCS) circuits and connections for 
individual nodes. 


The data items shown on this page correspond to data that the Show 

Cluster utility (SHOW CLUSTER) displays for the SYSTEMS, MEMBERS, 
CONNECTIONS, and CIRCUITS classes. No SHOW CLUSTER counterpart 
exists for the PEDRIVER LAN virtual circuit, channel, and device detail displays. 
The data items shown on the page also correspond to data that the SCACP utility 
displays for SHOW commands that display PORT, CIRCUIT, VC, CHANNEL, 
and LAN DEVICE information. 


Figure 4-1 OpenVMS Cluster Summary 


alo! 


File View Fix Customize 4e\p 


(Node Summary [CPU {Memory {W/O [Disk (Lock Contention | Cluster Summary | 


-Summary- 
Formed: 21-Mar-2002 10:32 Members In: 19 
Last Trans: 03-Apr-2002 14:24 Members Out: o 
Votes: 16 Quorum: 10 
Expected Votes: 19 QD Votes: 65535 
Failover Step: 58 Failover ID: 381 


10042 1 6 MEMBER 
6 10048 1 19 10 10 MEMBER 
10046 1 19 10 10 MEMBER 
7 10047 1 19 10 10 MEMBER 
9 =1OOB1 1 19 10 1 MEMBER 
88 10044 0 19 10 1 MEMBER 
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The two panes in the Cluster Summary page display the following information: 


e The Summary pane (top) displays summary information about the entire 
cluster. 


e The Cluster Members pane (bottom) displays detailed information about each 
node in the cluster, including its System Communication Architecture (SCA) 
connections with other nodes. 
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4.1.1 OpenVMS Cluster Event 


The Data Analyzer signals the LOVOTE event when cluster votes minus cluster 
quorum is less than the threshold value for the event. (The default threshold for 
the LOVOTE event is 1.) 


LOVOTE, 


‘node’ VOTES count is close to or below QUORUM 


4.1.2 OpenVMS Cluster Summary Pane 
Table 4—1 describes the data in the OpenVMS Cluster Summary pane 


(Figure 4-1). 


Table 4-1 Summary Pane Data 


Data 


Description 


Formed 
Last Trans 
Votes 


Expected Votes 


Failover Step 
Members In 
Members Out 
Quorum! 

QD Votes 


Failover ID 


Date and time the cluster was formed. 
Date and time of the most recent cluster state transition. 


Total number of quorum votes being contributed by all cluster members 
and by the quorum disk. 


The expected votes contribution by all members of the cluster. This 
value is calculated from the maximum EXPECTED_VOTES system 
parameter and the maximized value of the VOTES system parameter. 


Current failover step index. Shows which step in the sequence of 
failover steps the failover is currently executing. 


Number of cluster members to which the Data Analyzer has a 
connection. 


Number of cluster members to which the Data Analyzer either has no 
connection or has lost its connection. 


Number of votes that must be present for the cluster to function and to 
permit user activity, that is, to “maintain cluster quorum.” 


Number of votes given to the quorum disk. A value of 65535 means no 
quorum disk exists. 


Failover instance identification. Unique ID of a failover sequence that 
indicates to system managers whether a failover has occurred since the 
last time they checked. 


1You can adjust the quorum value by using the Adjust Quorum fix described in Section 6.2.1. 


4.1.3 OpenVMS Cluster Members Pane 


The Cluster Members pane (the lower pane on the Cluster Summary page 
(Figure 4—1) lists all the nodes in the cluster and provides detailed information 
about each one. Figure 4—2 shows only the Cluster Members pane. 
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Figure 4-2 OpenVMS Cluster Members Pane 
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The first level of information in the Cluster Members pane is cluster member 
data, which is described in Table 4—2. 


Table 4-2 Cluster Member Data 


Data Description 

SCS Name System Communications Services (SCS) name for the node (system 
parameter SCSNODE). 

SCSID SCS identification for the node (system parameter SCSYSTEMID). 

CSID Cluster system identification. 

Votes Number of votes the member contributes. 

Expect Member’s expected votes as set by the EXPECTED_VOTES system 
parameter. 

Quorum Number of votes that must be present for the cluster to function and 
permit user activity, that is, to “maintain cluster quorum.” 

LekDirWt Lock manager distributed directory weight as determined by the 
LCKDIRWT system parameter. 

Status Current cluster member status: 


Transition Time 


Status Value Description 


NEW New system in cluster. 

BRK_NEW New system; there has been a break in the 
connection. 

MEMBER System is a member of the cluster. 

BRK_MEM Member; there has been a break in the connection. 

NON System is not a member of the cluster. 

BRK_NON Nonmember; there has been a break in the 
connection. 

REMOVED System has been removed from the cluster. 

BRK_REM System has been removed from the cluster, and 


there has also been a break in the connection. 


The time of the system’s last change in cluster membership status. 
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4.2 Summary Data in the Cluster Members Pane 


The following sections contain descriptions of the categories of summary data 
displayed in the Cluster Members pane (Figure 4—2). 


When you click the handle before an SCS (System Communications Services) 
Name, the Data Analyzer first displays a Ports heading, if managed object data 
collection is enabled on this SCS node. 


A port is an OpenVMS device that provide SCA (System Communications 
Architecture) services. Port summary data is discussed in Section 4.2.1. Below 
the Ports heading is the Circuits heading, which precedes a line of SCA headings. 
(SCA data is discussed in Section 4.2.2.) 


4.2.1 Port Summary Data 


When you initially click the handle in front of Ports in the Cluster Members pane 
(Figure 4-1) to a vertical position, Ports headings are displayed, with information 
about port interfaces on the local system, as shown in Figure 4-3. 


Figure 4-3 Port Summary Data 
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The port summary data shown in Figure 4—3 is described in Table 4-3. Data 
items in this table are related to the SCACP utility SHOW PORTS display and 
the SHOW CLUSTER utility LOCAL_PORT CLASS display. 
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Table 4-3 Local Port Data 


Data Description 
Local Port: 
Name Device name of the port. 
Number The local port’s interconnect address or other interconnect-specific 


Mgmt Priority 
Load Class 


Messages Sent: 
Count 
Rate 


Messages Received: 


Count 
Rate 

Datagrams Sent: 
Count 
Rate 


Datagrams 
Received: 


Count 
Rate 
Kilobytes Mapped 


identifier. 
Management priority assigned to the port. 


Capacity value of the port, based on the rate (in megabits/second) of 
the interconnect of the port. 


Total number of messages sent since the port was initialized. 


Rate at which messages are sent (per second). 


Total number of messages sent since the port was initialized. 


Rate at which SCS messages are received (per second). 


Total number of SCS datagrams sent since the port was initialized. 


Rate at which SCS datagrams are sent (per second). 


Total number of SCS datagrams sent since the port was initialized. 
Rate at which SCS datagrams are sent (per second). 
Number of kilobytes mapped for block transfer. 


4.2.2 SCA (System Communications Architecture) Summary Data 


Below the Circuits heading in Figure 4—4 is a line of SCA summary headings 
that include information about a node’s SCS circuits between local SCA ports and 
remote SCA ports on other nodes in the cluster. More than one circuit indicates 
more than one communications path to the other node. 


The data displayed in Figure 44 is similar to the information that the Show 
Cluster utility (SHOW CLUSTER) displays for the CIRCUITS, CONNECTIONS, 
and COUNTERS classes and that the SCACP utilityss SHOW CIRCUITS 
command displays. Note that circuit count is the total number of events since the 
the state of the circuit changed to OPEN. 


Starting with Availability Manager Version 2.2, the circuits display shows circuits 
to non-OpenVMS nodes, such as storage controllers. 
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Figure 4-4 SCA Summary Data 
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Table 4—4 describes the SCA summary data displayed under the Circuits 
heading in Figure 4—4. Each line of data shows either a summary of an SCS 
connection between a local system connection of an application (or SYSAP) to 
a remote SYSAP that uses the circuit, or a summary of interconnect-specific 
information about the operation of the circuit. 


Some of the data described in Table 4—4 is not displayed in Figure 4—4 because 
the screen display is wider than shown. You can scroll to the right on your 
terminal screen to display the remaining fields described in the table. 


Note 


Each rate referred to in Figure 4—4 is in messages per second. The 
“Message Rates” data are rates; the remaining data items are counts. 


Table 4-4 SCA Summary Data 
Data 


Description 


Remote Node SCS name of the remote node containing the remote port of the 


circuit. 


Local Port The device name of the local port associated with the circuit. 


(continued on next page) 
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Table 4—4 (Cont.) SCA Summary Data 


Remote Port: 


Type The remote port’s device or interconnect type associated with the 
circuit (for example, LAN, CIPCA, DSSI). 
Number The remote port’s interconnect address, or another other 
interconnect-specific unique identifier. 
State The state of the virtual circuit connection. 
Priority: 
Curr Circuit’s current priority, which is the sum of the management 
priorities assigned to the circuit and associated local port. 
Mgmt Priority value assigned to the circuit by management action. 
Load Class The circuit’s current capacity rating, derived from the current 


Message Rates: 


ECS member’s load class values. 


Sent Count/rate of SCS messages sent over the circuit. 
Received Count/rate that SCS messages are received on the circuit. 
Block Data 
(Kilobytes): 
Mapped Count/rate of kilobytes mapped for block data transfers over the 
circuit. 
Sent Count/rate of kilobytes sent over the circuit using transfers. 
Requested Count/rate of kilobytes requested from the remote port over the 


Block Data (Count): 


circuit using request block data transfers. 


Sent Count/rate of send block data transfers over the circuit. 
Requested Count/rate of block data transfer requests sent over the circuit. 
Datagrams: 
Sent Count/rate of SCS datagrams sent over the circuit. 
Received Count/rate of SCS datagrams received on the circuit. 
Credit Wait Count/rate any connection on the circuit had to wait for a send 
credit. 
Buff Desc Wait Count/rate any connection over the circuit had to wait for a 


buffer descriptor. 


4.2.3 SCS (System Communications Services) Connections Summary Data 


You can click the handle at the beginning of an SCA data row to display the 
following headings when they apply to a particular node: 


e SCS Connections 
e LAN Virtual Circuit Summary 


To display SCS connections summary data, click the handle at the beginning 
of the “SCS Connections” row on the Cluster Summary pane (Figure 4—1). 
Figure 4-5 displays SCS Connections data information. 
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Figure 4—5 SCS Connections Data 
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Table 4—5 describes the SCS connections data shown in Figure 4—5. Some of the 
data described in Table 4—5 is not displayed in Figure 4—5 because the screen 
display is wider than shown. You can scroll to the right on your terminal screen 
to display the remaining fields described in the table. 


Note that connection count is the total number of events since the state of the 
connection changed to OPEN. 


Table 4-5 SCS Connections Data 


Data Description 
SYSAPs: 
Local Name of the SYSAP (system application) on the local system 
associated with the connection. 
Remote Name of the SYSAP on the remote system associated with the 


connection. 


(continued on next page) 
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Table 4—5 (Cont.) SCS Connections Data 


Data 


Description 


State 


Message Rates: 
Sent 


Received 


Block Data 
(Kilobytes): 


Mapped 
Sent 
Requested 


Block Data (Number): 
Sent 


Requested 


Datagrams: 
Sent 
Received 
Credit Wait 
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The connection’s current state. The possible items displayed are: 
e ACCP_SENT—An accept request has been sent. 
e CLOSED—The connection is closed. 


e CON_ACK—A connect request has been sent and 
acknowledged. 


e CON_REC— A connect request has been received. 

e CON_SENT— A connect request has been sent. 

e DISC_ACK—A disconnect is acknowledged. 

e DISC_MTCH—A disconnect request has matched. 

e DISC_REC— A disconnect request has been received. 
e DISC_SENT—A disconnect request has been sent. 

e LISTEN— The connection is in the listen state. 

e OPEN—The connection is open. 

e REJ_SENT— A rejection has been sent. 

e VC_FAI—The virtual circuit has failed. 


Count/rate that SCS messages are sent over the connection. 


Count/rate that SCS messages are being received on the 
connection. 


Count/rate of kilobytes mapped for block data transfers by the 
local SYSAP using the connection. Note: This field is available 
only in raw data format. 


Number of kilobytes sent over the SCS connection by the local 
SYSAP using send block data transfers. 


Number of kilobytes requested over the SCS connection by the 
local SYSAP using request block data transfers. 


Count/Rate of send block data transfers by this node over the 
SCS connection. 


Count/Rate of request block data transfers sent to the remote 
port over the SCS connection. 


Count/Rate of datagrams sent on the SCS connection. 
Count/Rate of datagrams received on the SCS connection. 
Count/Rate of times the connection had to wait for a send credit. 


(continued on next page) 
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Table 4—5 (Cont.) SCS Connections Data 


Data Description 
Buff Desc Wait Count/Rate of times the connection had to wait for a buffer 
descriptor. 


4.2.4 LAN Virtual Circuit Summary Data 


You can display interconnect-specific LAN virtual circuit summary data by 
clicking the handle at the beginning of a “LAN Virtual Circuit Summary” row to 
a vertical position. The screen expands to display the interconnect-specific VC 
summary data shown in Figure 4-6. 


Figure 4-6 LAN Virtual Circuit Summary Data 
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Much of the data in this display corresponds to the information displayed by the 
SCACP command SHOW VC. The SHOW CLUSTER command does not provide 
a corresponding display. Which data items are displayed depends on the type of 
interconnect the virtual circuit is using. 
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Currently, this feature is available only for LAN virtual circuits. VC Summary 
displays for other cluster interconnects such as CI might be available in the 
future. When other interconnects are supported, the interconnect type will 

be displayed at the beginning of the line—for example, CI Virtual Circuit 
Summary—and the associated heading will have interconnect-specific data items. 


Note that LAN Virtual Circuit counters are initialized when PEDRIVER detects 
the existence of a PEDRIVER on a remote system. All of a LAN VC’s counters 
are cumulative from that time. 


Some of the data described in Table 4-6 is not displayed in Figure 4-6 because 
the screen display is wider than shown. You can scroll to the right on your 
terminal screen to display the remaining fields described in the table. 


Table 4-6 describes the LAN Virtual Circuit Summary data items shown in 


Figure 4-6. 


Table 4-6 LAN Virtual Circuit Summary Data 


Data 


Description 


VC State 


Total Errors 
ReXmt Ratio 


Channels: 
Open 
ECS 


ECS Priority 


MaxPktSiz 
ReXmt TMO (usec) 


XmtWindow: 
Cur 


Max 


Current internal state of the virtual circuit: 
e OPEN—Virtual Circuit is open and usable. 


e PATH—At least one open channel has been established, but the 
Virtual Circuit has not yet transitioned to OPEN. 


e CLOSED—The Virtual Circuit has been closed or has become 
unusable. 


Number of times the virtual circuit has been closed or has had other 
errors. 


Ratio of total numbers of transmitted to retransmitted packets 
during the most recent data collection interval. 


Number of currently open channels available to the virtual circuit. 


Number of equivalent channel set (ECS) channels currently in use 
by the LAN virtual circuit. 


Priority a channel must have in order to be included in the 
Equivalent channel set (ECS). It is the highest priority any open 
and tight channel has. 


Maximum data buffer size in use by this LAN virtual circuit. 


Retransmission timeout, in microseconds. The length of time the 
virtual circuit is currently using to wait for an acknowledgment of 
the receipt of a packet before retransmitting that packet. 


Current value of the transmit window (or pipe quota). Maximum 
number of packets that are sent before stopping to await an 
acknowledgment. After a timeout, the transmit window is reset to 1 
to decrease congestion; it is allowed to increase as acknowledgments 
are received. 


Maximum transmit window size currently allowed for the virtual 
circuit. 


(continued on next page) 
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Table 4-6 (Cont.) LAN Virtual Circuit Summary Data 


Data Description 


Xmt Options Transmit options enabled: 


CKSM—packet checksumming 
CMPR—compression 


Packets: 
Sent Number of packets sent over this virtual circuit. 
Received Number of packets received over this virtual circuit. 


Most recent: 
Time Opened Most recent time the virtual circuit was opened. 


Time Closed Most recent time the virtual circuit was closed. 


4.2.5 LAN Path (Channel) Summary Data 


A LAN path or channel is a logical communication path between two LAN 
devices. Channels between nodes are determined by a local device, a remote 
device, and the connecting network. For example, two nodes, each having two 
devices, might establish four channels between the nodes. The packets that 
a particular LAN virtual circuit carries can be sent over any open channel 
connecting the two nodes. 


The difference between channels and virtual circuits is that channels provide 
datagram service. Virtual circuits, layered on channels, provide error-free 
paths between nodes. Multiple channels can exist between nodes in an OpenVMS 
Cluster system, but only one LAN-based virtual circuit can exist between any two 
nodes at a time. 


LAN channel counters are initialized when PEDRIVER detects the existence of 
a LAN device on a remote system. All of a LAN channel counters are cumulative 
from that time. For more information about channels and virtual circuits, see the 
HP OpenVMS Cluster Systems manual. 


Displaying Data 

You can display LAN channel summary data by clicking the handle at the 
beginning of a “LAN Virtual Circuit Summary Data” row (Figure 4-6), or by 
right-clicking a data item and choosing the Channel Summary item from the 
shortcut menu. The screen expands to display the LAN channel summary data 
shown in Figure 4-6. If there is no handle at the beginning of a “LAN Virtual 
Circuit Summary” data row, then managed object data collection is not enabled 
for this SCS node. 


The data items displayed depend on the type of virtual circuit. Currently, this 
feature is available only for LAN virtual circuits. 


Some of the data described in Table 4—7 is not displayed in Figure 4-6 because 
the screen display is wider than shown. You can scroll to the right on your 
terminal screen to display the remaining fields described in the table. 
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Table 4-7 LAN Path (Channel) Data 


Data Description 

Devices: 
Local Local LAN device associated with the channel. 
Remote Remote LAN device associated with the channel. 


Channel State One of the following states: 
e OPEN—Channel is usable. 


e PATH—Channel handshake has been completed and, if usable, will 
transition to OPEN. 


e CLOSED—Channel has been shut down or is unusable. 


Total Errors Total of various error counters for this channel (see channel details for 
breakdown). 
ECS State Channel ECS membership information: 


e Y—Member 

e N—Nonmember 

Losses—one of the following: 

e TT (tight)—Packet loss history is acceptable. 

e L (ossy)—Recent history of packet losses makes channel unusable. 
Capacity—one of the following: 


e P (peer)—Priority and Buffer size both match the highest 
corresponding values of the set of tight channels, entitling the 
channel to be an ECS member. 


e I (inferior)—Priority or buffer size does not match the 
corresponding values of the set of tight channels. 


e SS (superior)—Priority or buffer size is better than those of the 
current corresponding values of the set ECS member channels. 
This is a short-lived, transient state because it exists only while the 
ECS membership criteria are being re-evaluated. 


e U (unevaluated)—Priority or buffer size, or both, have not been 
evaluated against the ECS criteria, usually because the channel is 
lossy. 


Speed—one of the following: 


e F (fast)—Channel delay is among the best for tight and peer 
channels. 


e S (slow)—Channel delay makes channel too slow to be usable 
because it would limit the virtual circuit’s average delay. 


Note: If a channel is lossy, its capacity and speed are not always kept 
current. Therefore, displayed values might be those that the channel 
had at the time it become lossy. 


(continued on next page) 
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Table 4-7 (Cont.) LAN Path (Channel) Data 


Data Description 
Priority: 
Cur Current priority used to evaluate the channel for ECS membership. 
This is the sum of management priority values assigned to the LAN 
device. 
Mgmt Dynamic management-assigned priority. 
Hops Number of switches or bridges in this channel’s network path to the 


remote LAN device. 


BufSiz Current maximum amount of SCS data that can be contained in a 
packet sent over the channel. It is the smallest of the following values: 


e Local LAN device buffer sizes 

e Remote LAN device buffer sizes 

e Local NISCS_MAX_PKTSZ system (SYSGEN) parameter values 

e Remote NISCS_MAX PKTSZ system (SYSGEN) parameter values 


e Largest packet size determined by the NISCA Channel Packet Size 
probing algorithm that the intervening network can deliver 


Delay (usec) Running average of measured round-trip time, in microseconds, for 
packets sent over the channel. 
Load Class Load class initialized from local and remote LAN device bit rates. 
Packets: 
Sent Number of packets sent on this channel, including control packets. 


Received Number of packets received by this channel. 


Most recent: 


Time Last time this channel had a verified usable path to a remote system. 
Opened 

Time Time that this channel was last closed. 

Closed 


4.3 Detailed Data Accessed Through the Cluster Members Pane 


The following sections describe data that appears on lines that you can open in 
the Cluster Members pane (Figure 4—2). 


4.3.1 LAN Device Summary Data 


You can display LAN device summary data by first right-clicking a node name 
on the Cluster Members pane. On Version 7.3 or later nodes on which managed 
objects are enabled, the Data Analyzer displays a menu with the following 
choices: 


e SCA Summary 
e LAN Device Summary... 
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Click LAN Device Summary... to display the Device Summary Data page 


(Figure 4—7). 


Figure 4-7 LAN Device Summary Data 
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Right-click LAN Device or IP Interface data item for options and fixes 


COKE LAN Device/P Interface Summary 


You can right-click any data item on the page to display a menu with LAN 
Device Fixes... on it. These fixes are explained in Chapter 6. 


Table 4-8 describes the LAN device summary data displayed in Figure 4—7. This 
data is also displayed with SCACP command SHOW LAN_DEVICE. 


Table 4-8 LAN Device Summary Data 


Data 


Description 


LAN Device 


Type 


Errors 


Management: 
Priority 
BufSize 

BufSize 


Messages: 


Name of the LAN device used for cluster communications between 
local and remote nodes. 


The icon preceding each LAN device can be one of the following 
colors: 


e Black—not enabled (“Not in use by SCA”) 
e =Yellow—“Run” not set 
e Red—“Run” and anything other than Online, Local, or Restart 


e Green—“Run” and a combination of Online, Local, and Restart 
only 


A tooltip indicates the possible states a device can be in. This 
can be a combination of the following: Run, Online, Local, Hello 
_Busy, Build_Hello, Init, Wait_Mgmt, Wait_Evnt, Broken, XChain_ 
Disabled, Delete_pend, Restart, or Restart_Delay. Alternatively, a 
tooltip might display “Not in use by SCA.” 


Type of LAN device used for the cluster. 


Number of errors reported by the device since cluster 
communications began using it. 


Current management-assigned priority of the device. 
Current management-assigned maximum buffer size of the device 


Smaller of interconnect specific buffer size of the device and its 
current management-assigned buffer size. 


(continued on next page) 
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Table 4-8 (Cont.) LAN Device Summary Data 


Data Description 
Sent Number of LAN packets sent by the device. 
Received Number of packets received from remote LAN device. 


4.3.2 LAN Device Detail Data 


To display LAN device detail data, right-click a LAN Path (Channel) Summary 
data item on the LAN Virtual Circuit Summary data page (Figure 4-6). The Data 
Analyzer then displays the shortcut menu shown in Figure 4-8. 


Figure 4—8 LAN Path (Channel) Details Menu 
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To display device details, select the LAN Device Details... item on the menu. 
After a brief delay, a LAN Device Overview Data page (Figure 4—9) is displayed. 


A series of tabs at the top of the LAN Device Overview Data page indicate 
additional LAN device pages that you can display. Much of the LAN device 
detail data corresponds to data displayed by the SCACP command SHOW LAN_ 
DEVICE. 


4.3.2.1. LAN Device Overview Data 


The LAN Device Overview Data page (Figure 4-9 displays LAN device summary 
data. 
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Figure 4-9 LAN Device Overview Data 
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Table 4-9 describes the data displayed in Figure 4-9. 


Table 4-9 LAN Device Overview Data 


Data 


Description 


Status 


Device Name 


Device Type 


Total Errors 


Priority 


Max Buffer Size 
Mgmt Buffer Size 
Load Class 


Receive Ring Size 


Default LAN Address 
Current LAN Address 


Device status: Run, Online, Local, Hello _Busy, Build_Hello, 
Init, Wait_Mgmt, Wait_Evnt, Broken, XChain_Disabled, Delete_ 
pend, Restart, or Restart_Delay. Alternatively, “Not in use by 
SCA” can be displayed. 


Name of the LAN device. 

OpenVMS device type value. 

Total number of errors listed on the Errors page. 

Dynamic management-assigned priority. 

Maximum data buffer size for this LAN device. 

Dynamic management-assigned maximum block data field size. 


Load class. The rate in MBs currently being reported by the 
LAN device. 


Number of packets the LAN device can buffer before it discards 
incoming packets. 


LAN device’s hardware LAN address. 
Current LAN address being used by this LAN device. 


4.3.2.2 LAN Device Transmit Data 
The LAN Device Transmit Data page (Figure 4-10) displays LAN device transmit 


data. 
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Figure 4-10 LAN Device Transmit Data 
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LAN Device Details: GRCK4 (EVA) 


Table 4-10 describes the data displayed in Figure 4-10. 


Table 4-10 LAN Device Transmit Data 


Data Description 

Messages Sent Number of packets sent by this bus, including multicast “Hello” 
packets. 

Bytes Sent Number of bytes in packets sent by this LAN device, including 


multicast “Hello” packets. 
Multicast Msgs Sent Number of multicast “Hello” packets sent by this LAN device. 


Multicast Bytes Sent Number of multicast bytes in “Hello” packets sent by this LAN 
device. 


Outstanding I/O Count Number of transmit requests being processed by LAN driver. 


4.3.2.3 LAN Device Receive Data 


The LAN Device Receive Data page (Figure 4—11) displays LAN device receive 
data. 
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Figure 4-11 LAN Device Receive Data 
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LAN Device Details: GRCK4 (EVA) 


Table 4—11 describes the data displayed in Figure 4-11. 


Table 4-11 LAN Device Receive Data 


Data Description 

Messages Revd Number of packets received by this LAN device, including 
multicast packets. 

Bytes Received Number of bytes in packets received by this LAN device, 


including multicast packets. 


Multicast Msgs Revd Number of multicast NISCA packets received by this LAN 
device. 


Multicast Bytes Revd Number of multicast bytes received by this LAN device. 


4.3.2.4 LAN Device Events Data 


The LAN Device Events Data page (Figure 4-12) displays LAN device events 
data. 
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Figure 4-12 LAN Device Events Data 
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Count Rate 
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LAN Device Details: GRCK4 (EVA) 


Table 4—12 describes the data displayed in Figure 4-12. 


Table 4-12 LAN Device Events Data 


Data Description 

Port Usable Number of times the LAN device became usable. 

Port Unusable Number of times the LAN device became unusable. 
Address Change Number of times the LAN device’s LAN address changed. 
Restart Failures Number of times the LAN device failed to restart. 

Last Event Event type of the last LAN device event (for example, LAN 


address change, an error, and so on). 


Time of Last Event Time the last event occurred. 


4.3.2.5 LAN Device Errors Data 


The LAN Device Errors Data page (Figure 4-13) displays LAN device errors 
data. 
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Figure 4-13 LAN Device Errors Data 
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Table 4-13 describes the data displayed in Figure 4-13. 


Table 4-13 LAN Device Errors Data 


Data 


Description 


Bad SCSSYSTEM ID 


Received a packet with the wrong SCSSYSTEM ID in it. 


MC Msgs Directed to TR Number of multicast packets directed to the NISCA Transport 


Layer 


Short CC Messages 
Received 


Short DX Messages 
Received 


CH Allocation Failures 


VC Allocation Failures 


Wrong Port 
Port Disabled 


H/W Transmit Errors 
Hello Transmit Errors 


Last Transmit Error 
Reason 


Time of Last Transmit 
Error 


layer. 


Number of packets received that were too short to contain a 
NISCA channel control header. 


Number of packets received that were too short to contain a 
NISCA DX header. 


Number of times the system failed to allocate memory for use 
as a channel structure in response to a packet received by this 
LAN device. 


Number of times the system failed to allocate memory for use 
as a VC structure in response to a packet received by this LAN 
device. 


Number of packets addressed to the wrong NISCA address. 


Number of packets discarded because the LAN device was 
disabled. 


Number of local hardware transmit errors. 
Number of transmit errors during HELLOs. 


Reason for last transmit error. 


Time of last transmit error: date and time. 
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4.3.3 LAN Path (Channel) Detail Data 


To display LAN path (channel) detail data, right-click a LAN channel summary 
data item on the Cluster Summary page (Figure 4-6). The Data Analyzer 
displays a shortcut menu with the options shown in Figure 4-8. 


To display LAN channel details, select the Channel Details... item on the 
menu. After a brief delay, a LAN Channel Overview Data page (Figure 4—14) 
is displayed. A series of tabs at the top of this page indicate additional channel 
pages that you can display. 


4.3.3.1. LAN Channel Overview Data 


The LAN Channel Overview Data page (Figure 4-14) displays general channel 
data, including the state, status, and total errors of the channel. 


Figure 4-14 LAN Channel Overview Data 


[i] Channel Details: 2B0YS (EWA) to AMDS [ESA} } - (5) x) 


File Fix 
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Overview 

State Open 
Status Tight, Peer, Fast 
Total Errors 6 
Time Opened 25-Apr 16:51:28.505 
Time Closed (No Time} 
Total Time Open (No Time) 
Device Name EWA 
Device Type EW_DESOO 
Average RTT 7445.4 ps 
RSVP Threshold 0 
Remote Ring Size 8 
Remote Device Type ES_LANCE 
Remote T/R Cache ji 
LAN HW Address 44-00-04-00-AA-FD 


Channel Details: 2BOYS (EWA) to AMDS (ESA) 


Table 4-14 describes the data displayed in Figure 4-14. 


Table 4-14 LAN Channel Overview Data 


Data Description 

State Channel’s current state: OPEN, PATH, or CLOSED. 
Status Channel status. 

Total Errors Sum of channel’s error counters. 

Time Opened Last time that this channel had a path to a remote system. 
Time Closed Last time that this channel was closed. 

Total Time Open Total time that this channel has been open. 

Device Name Local LAN device name. 


(continued on next page) 
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Table 4-14 (Cont.) LAN Channel Overview Data 


Data Description 

Device Type Local LAN device type. 

Average RTT Average of measured round-trip time. 

RSVP Threshold Number of packets before requesting that the remote node 


immediately return an acknowledgment. 
Remote Ring Size Number of entries in the remote LAN device. 


Remote Device Type Remote LAN device type. 


Remote T/R Cache Number of out-of-order packets that the remote transmit/receive 
resequencing cache can buffer. 
LAN H/W Address LAN device’s hardware address. 


4.3.3.2 LAN Channel Counters Data 


The LAN Channel Counters Data page (Figure 4-15) displays path counters data, 
including ECS transitions as well as messages and bytes sent. 


Figure 4-15 LAN Channel Counters Data 
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Table 4-15 describes the data displayed in Figure 4-15. 
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Table 4-15 LAN Channel Counters Data 


Data 


Description 


ECS Transitions 
Messages Sent 

Bytes Sent 

Control Messages Sent 
Control Msg Bytes Sent 


Messages Received 
Bytes Received 


MC Control Messages 
Revd 


MC Control Msg Bytes 
Revd 


Control Messages Revd 
Control Msg Bytes Revd 


Number of times this channel has been in and out of the 
equivalent channel set (ECS). 


Number of packets sent over this channel, including control 
packets. 


Number of bytes transmitted on this channel, including 
control packets. 


Number of control packets sent, not including multicast 
packets. 


Number of control packet bytes sent, not including multicast 
packets. 


Number of packets received by this channel. 
Number of bytes in packets received by this channel. 


Number of multicast control packets received. 
Number of multicast control packets bytes received. 


Number of control packets received. 


Number of control packet bytes received. 


4.3.3.3 LAN Channel Errors Data 


The LAN Channel Errors Data page (Figure 4-16) displays LAN channel errors 


data. 


Figure 4-16 LAN Channel Errors Data 
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Table 4-16 describes the data displayed in Figure 4-16. 


Table 4-16 LAN Channel Errors Data 


Data 


Description 


Seq Retransmit 


LAN Transmit 
Failures 


Restart Channel 
Channel Init Timeouts 
Listen Timeouts 

Bad Authorization Msg 
Bad ECO CC Msg 


Bad Multicast Msg 
CC Short Packet 
CC Incompatible 


Rev Old Channel 
No MSCP Server 


Disk Not Served 
Buffer Size Change 


Number of times a sequenced VC packet sent on this channel 
was retransmitted, and the channel was penalized for the lost 
packet. 


Number of times the local LAN device reported a failure to 
transmit a packet, and channel was penalized for the lost 
packet. 


Close/restart because of channel control packet was received 
indicating the other end closed the channel and is restarting the 
channel handshake. 


Channel initialization handshake timeout. 


No packets of any kind, including HELLOs, were received in 
LISTEN_TIMEOUT seconds. 


Received a CC (channel control) packet with a bad authorization 
field. 


Received a CC packet with an incompatible NISCA protocol ECO 
rev. field value. 


Received a bad multicast CC packet. 
Received a CC packet that was too short. 


Received a CC packet that was incompatible with existing 
channels for this virtual circuit. 


Received a packet from an old instance of a channel. 


No MSCP server available to respond to a received channel 
control solicit service packet asking this node to boot serve 
another node. 


Disk is not served by this system. 


Change in buffer size. 


4.3.3.4 LAN Channel Remote System Data 


The LAN Channel Remote System Data page (Figure 4-17) displays LAN path 


remote system data. 
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Figure 4-17 LAN Channel Remote System Data 
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Table 4-17 describes the data displayed in Figure 4-17. 


Table 4-17 LAN Channel Remote System Data 


Data Description 

Node Name Node name of remote system. 

Buffer Size Buffer size (largest possible buffer size) of remote system. 
Max Buffer Size Current upper bound on buffer size usable on this channel. 
Services NISCA services supported on this channel. 

Dev Name Name of the remote LAN device. 

LAN Address Remote hardware address. 

H/W Type Hardware type of remote node. 

Protocol Version NISCA protocol version of remote system. 


4.3.3.5 LAN Channel ECS (Equivalent Channel Set) Criteria Data 


The LAN Channel ECS Criteria Data page (Figure 4-18) displays equivalent 
channel set criteria data. 
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Figure 4-18 LAN Channel ECS Criteria Data 


{,j), Channel Details: 2BOYS (EWA) to AMDS (ESA) ; 


File Fix 


=/O) x! 


(Overview [Counters [Errors {Remote System [ECS Criteria | 


-ECS Criteria 


ECS Membership 
Time Entered ECS 
Time Exited ECS 
Total Time in ECS 
Losses 

Capacity 

Priority 
Management Priority 
Buffer Size 
Management Buffer Size 
Hops 

Management Hops 
Speed 

Average RTT 

Load Class 

Local Seq Number 
Remote Seq Number 


Member 
(No Time) 
(No Time} 
(No Time} 

0 
Bee 


Fast 
7816.1 ps 

10 

1 

3 


Channel Details: 2BOYS (EWA) to AMDS (ESA) 


Table 4-18 describes the data displayed in Figure 4-18. 


Table 4-18 LAN Channel ECS Criteria Data 


Data 


Description 


ECS Membership 
Time Entered ECS 
Time Exited ECS 
Total Time in ECS 
Losses 


Capacity 


Priority 


Management Priority 


Buffer Size 


Management Buffer Size 


Hops 


Management Hops 


Speed 
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ECS membership status; that is, Member or Nonmember. 
Last time this channel entered the ECS. 

Last time this channel exited the ECS. 

Total time this channel was in the ECS. 

Value representing channel’s recent packet loss history. 


Channel’s capacity rating based on evaluating its priority, 
buffer size, and hops values relative to the current ECS 
criteria. Values are: Ungraded, Peer, Inferior, Superior. 


Channel’s current priority for ECS calculations; it is the 
sum of the management priorities assigned to the local LAN 
device and to the channel. 


Dynamic management-assigned priority. 


Negotiated maximum common buffer size: the smaller of 
local and remote BUS$ limits on block data field sizes. 


Maximum block data field size assigned by dynamic 
management. 


Number of switches or bridges for this channel. 


Management-supplied hops or media packet storage 
equivalent. 


Classification of channel’s delay relative to that of the lowest 
delay of any ECS member. 
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Table 4-18 (Cont.) LAN Channel ECS Criteria Data 


Data Description 

Average RTT Average measured round-trip time. 

Load Class Lesser of the local and remote LAN device load class values. 
Local Seq Number Sequence number of the local channel. 

Remote Seq Number Sequence number of the remote channel. 


4.3.4 LAN Virtual Circuit Detail Data 


The Network Interconnect for System Communications Architecture (NISCA) is 
the transport protocol responsible for carrying packets such as disk I/Os and lock 
packets across Ethernet and FDDI LANs to other nodes in the cluster. 


The LAN virtual circuit details (NISCA) pages show detailed information about 
the LAN Ethernet or FDDI connection between two nodes. The Data Analyzer 
displays one window for each LAN virtual circuit. This page is intended primarily 
to provide real-time aids for diagnosing LAN-related cluster communications 
problems. HP OpenVMS Cluster Systems describes the parameters shown on 
these pages and tells how to diagnose LAN-related cluster problems. 


The LAN Virtual Circuit Details pages provide the same information as the 
SCACP command SHOW VC and as the following OpenVMS System Dump 
Analyzer (SDA) commands: PE VC and SHOW PORTS/VC=VC_ remote-node- 
name. In these commands, remote-node-name is the SCS name of another node in 
the cluster. 


SDA defines VC_remote-node-name and performs the first SHOW PORTS action 
after SDA is started. Thus, the /CH and /VC options are valid only with the 
second and subsequent SHOW PORT commands. 


You can display LAN virtual circuit details data by double-clicking a “LAN 
Virtual Circuit Summary” data row or by right-clicking a menu on the Cluster 
Summary page (Figure 4-6). After a brief delay, a LAN VC Transmit Data page 
(Figure 4-19) is displayed. The tabs at the top of the page indicate additional 
pages that you can display. 


The data items displayed depend on the type of virtual circuit. Currently, this 
feature is available only for LAN virtual circuits. 


4.3.4.1. LAN VC Transmit Data 


Transmit data is information about the transmission of data packets, including 
the numbers of packets and bytes sent. Figure 4-19 is an example of a LAN VC 
Transmit Data page. 
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Figure 4-19 LAN VC Transmit Data 
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Table 4-19 describes the data displayed in Figure 4-19. 


Table 4-19 LAN VC Transmit Data 


Data 


Description 


Packets Sent 


Bytes Sent 


Unsequenced (DG) 


Sequenced 


ReXMT Ratio 


Lone ACK 


ReXMT Count 


ReXMT Timeout 
Options 


(Raw) count and rate of packets transmitted through the 
virtual circuit to the remote node, including both sequenced 
and unsequenced (channel control) packets and lone 
acknowledgments. 


(Raw) count and rate of bytes transmitted through the virtual 
circuit. 


(Raw) count and rate of the number of unsequenced packets that 
are transmitted. 


(Raw) count and rate of sequenced packets transmitted. 
Sequenced packets are guaranteed to be delivered. 


Ratio of the total number of sequenced packets sent to the 
current retransmission count. 


(Raw) count and rate of packets sent solely for the purpose of 
acknowledging receipt of one or more packets. 


Number of packets retransmitted. Retransmission occurs 
when the local node does not receive an acknowledgment for 
a transmitted packet within a predetermined timeout interval. 


Number of retransmission timeouts that have occurred. 


Transmit options enabled: 


CKSM—packet checksumming 
CMPR—compression 
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4.3.4.2 LAN VC Receive Data 


Receive data is information about the receipt of data packets. Figure 4—20 is an 
example of a LAN VC Receive Data page. 


Figure 4-20 LAN VC Receive Data 
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Raw Rate 
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Duplicate 2 0.00 
Out of Order 0 0.00 
Illegal ACK 0 0.00 
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Table 4—20 describes the data displayed in Figure 4-20. 


Table 4-20 LAN VC Receive Data 


Data Description 


Packets Received (Raw) count and rate of packets received on the virtual 
circuit from the remote node, including both sequenced 
and unsequenced—that is, datagram packets and lone 


acknowledgments. 

Bytes Received (Raw) count and rate of bytes received in packets over the 
virtual circuit. 

Unsequenced (DG) (Raw) count and rate of unsequenced—datagram—packets 
received. 

Sequenced (Raw) count and rate of sequenced packets received. 

Lone ACK (Raw) count and rate of lone acknowledgments received. 

Duplicate Number of duplicated packets received by this system. 


Duplicates occur when the sending node retransmits a packet, 
and both the original and the retransmitted packets are received. 


Out of Order Number of packets received out of order by this system. 


Illegal ACK Number of illegal acknowledgments received—that is, 
acknowledgments of an out-of-range sequence number. 
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4.3.4.3 LAN VC Congestion Control Data 
LAN VC congestion control data is information about LAN traffic. The values 
indicate the number of packets that can be sent to the remote node before 
receiving an acknowledgment and the retransmission timeout. 


Figure 4-21 is an example of a LAN VC Congestion Control Data page. An item 
that is dimmed indicates that the current version of OpenVMS does not support 
that item. 


Figure 4-21 LAN VC Congestion Control Data 


[i] 2BOYS Virtual Circuit to AMDS -|O) x} 
File Fix Help 
Channel Selection [VC Closures |Packets Discarded | 
Transmit “Tt Receive | Congestion Control | 
Congestion Control- 
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Transmit Window Current 8 
Transmit Window Grow 0 
Transmit Window Max 8 
Transmit Window Max (mgmt) QO 
Transmit Window Reached Ly 
Roundtrip Time 62135.5ys 
Roundtrip Deviatiom 46957.9 ps 
Retransmit Timeout 437798 .7 ps 
UnAcked Messages 0 
CMD Queue Length QO 
CMD Queue Max 40760 


2BOYS Virtual Circuit to AMDS 


Table 4—21 describes the data displayed in Figure 4-21. 


Table 4-21 LAN VC Congestion Control Data 


Data Description 


Transmit Window Current Current value of the transmit window (or pipe quota). After 
a timeout, the pipe quota is reset to 1 to decrease network 
path congestion. The pipe quota is allowed to increase as 
quickly as acknowledgments are received. 


Transmit Window Grow The slow growth threshold. The size at which the increase 
rate of the window is slowed to avoid congestion on the 
network again. 


Transmit Window Max Maximum transmit window size currently allowed for the 
virtual circuit based on channel and remote PEDRIVER 
receive cache limitations. 


Transmit Window Max Management override to calculated value for Maximum 
(mgmt) Transmit Window size. N/A on systems prior to Version 2.0. 
Transmit Window Number of times the entire transmit window was full. If this 
Reached number is small compared with the number of sequenced 


packets transmitted, then either the local node is not sending 
large bursts of data to the remote node, or acknowledging 
packets are being received so promptly that the window limit 
is never reached. 


(continued on next page) 
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Table 4—21 (Cont.) LAN VC Congestion Conirol Data 


Data 


Description 


Roundtrip Time 


Roundtrip Deviation 


Retransmit Timeout 


UnAcked Packets 
CMD Queue Length 
CMD Queue Max 


Average round-trip time, in microseconds, for a packet to be 
sent and acknowledged. 


VC round trip time values are dependent on the delayed ACK 
or the ACKholdoff delay, that is, 100 ms. The VC trip time is 
also dependent on the network traffic. 


If there is sufficient cluster traffic, the receive window at the 
remote node gets filled and the ACK is delivered sooner. 


If the cluster is idle with no traffic, there may be a delay 
of 100ms to send the ACK. Hence, in an idle cluster with 
less traffic, the VC round trip delay value is normally high. 
As the traffic increases, the VC round trip time delay value 
drops. 


Average deviation, in microseconds, of the round-trip time. 


Deviation/Variance: Whenever a new ACK delay is 
measured, it is compared with the current estimate of the 
ACK delay. The difference is a measure of the error in the 
delay estimate (delayError). This delayError is used as a 
correction to update the current estimate of ACK delay. 


To prevent a "bad" measurement from estimate, the 
correction due to a single measurement is limited to a 
fraction. 


The average of the absolute value of the delayError from the 
mean is used as estimation for the delays variance. 


Value, in microseconds, used to determine packet 
retransmission timeout. If a packet does not receive either an 
acknowledging or a responding packet, the packet is assumed 
to be lost and will be resent. 


Current number of unacknowledged packets. 
Current length of the virtual circuit’s command queue. 


Maximum number of commands in the virtual circuit’s 
command queue so far. 


4.3.4.4 LAN VC Channel Selection Data (Nonmanaged Objects) 
The display of information about LAN VC channel selection depends on the 
version of OpenVMS and whether managed objects have been enabled. (For more 
information about managed objects, see the introduction to this chapter.) 


Figure 4—22 is an example of a Nonmanaged Object LAN VC Channel Selection 


Data page. 
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Figure 4-22 LAN VC Channel Selection Data (Nonmanaged Objects) 


[ DBGAVC Virtual Circuit to DRINKS JAE 


File Fix 


Channel Selection /‘VC Closures |Packets Discarded | 
Transmit Receive Congestion Control | 


Channel Selection 


Buffer Size 1412 
Channel Count 1 
Channel Selections 5) 
Protocol 1.4.0 
Local Device EW_DE435 
Local LAN Address Aa-00-04-00-98-4¢ 
Remote Device EZ_SGEC 
Remote LAN Address 4A-00-04-00-AE-4¢ 
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Table 4—22 describes the data displayed in Figure 4-22. 


Table 4-22 LAN VC Channel Selection Data (Nonmanaged Objects) 


Data Description 

Buffer Size Maximum data buffer size for this virtual circuit. 

Channel Count Number of channels available for use by this virtual circuit. 

Channel Selections Number of channel selections performed. 

Protocol NISCA protocol version. 

Local Device Name of the local LAN device that the channel uses to send and 
receive packets. 

Local LAN Address Address of the local LAN device that performs sends and 
receives. 

Remote Device Name of the remote LAN device that the channel uses to send 


and receive packets. 


Remote LAN Address Address of the remote LAN device performing the sends and 
receives. 


4.3.4.5 LAN VC Channel Selection Data (Managed Objects Enabled) 


Systems running the Data Collector with managed objects enabled collect and 
display the following information about LAN VC Channel Selection Data. (For 
more information about managed objects, see the introduction to this chapter.) 


Note 


An additional requirement for displaying some of the data on this data 
page is that managed objects be enabled on your system. For more 
information, see the HP Availability Manager Installation Instructions. 
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Figure 4—23 is an example of a LAN VC Channel Selection Data page with 
managed objects enabled. 


Figure 4—23 LAN VC Channel Selection Data (Managed Objects Enabled) 
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Table 4—23 describes the data displayed in Figure 4-23. 


Table 4-23 Channel Selection Data (Managed Objects Enabled) 


Data 


Description 


ECS Priority 
Buffer Size 
Hops 

Channel Count 


Channel Selections 
Protocol 


Speed Demote 
Threshold 


Speed Promote 
Threshold 


Min RTT 
Min RTT Threshold 


Current minimum priority a tight channel must have in order to 
be an ECS member. 


Maximum data buffer size for this virtual circuit. A channel 
must have this buffer size in order to be an ECS member. 


Current minimum management hops a channel must have in 
order to be included in the ECS. 


Number of channels currently available for use by this virtual 
circuit. 


Number of channel selections performed. 
Remote node’s NISCA protocol version. 
Current threshold for reclassifying a FAST channel to SLOW. 


Current threshold for reclassifying a SLOW channel to FAST. 
Current minimum average delay of any current ECS members. 


Current threshold for reclassifying a channel as FASTER than 
the current set of ECS channels. 


(continued on next page) 
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Table 4—23 (Cont.) Channel Selection Data (Managed Objects Enabled) 


Data Description 
Mgmt Demote A management-specified lower limit on the maximum delay (in 
Threshold microseconds) an ECS member channel can have. Whenever 


at least one tight peer channel has a delay of less than the 
management-supplied value, all tight peer channels with delays 
less than the management-supplied value are automatically 
included in the ECS. When all tight peer channels have delays 
equal to or greater than the management setting, the ECS 
membership delay thresholds are automatically calculated and 
used. 


4.3.4.6 LAN VC Closures Data 


LAN VC closures data is information about the number of times a virtual circuit 
has closed for a particular reason. Figure 4—24 is an example of a LAN VC 
Closures Data page. 


An entry that is dimmed indicates that the current version of OpenVMS does not 
support that item. 


Figure 4—24 LAN VC Closures Data 
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Table 4—24 describes the data displayed in Figure 4-24. 


Table 4-24 LAN VC Closures Data 
Data Description 


No Path Number of times the VC was closed because no usable LAN path 
was available. 


SeqMsg TMO Number of times the VC was closed because a sequenced packet’s 
retransmit timeout count limit was exceeded. 


(continued on next page) 
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Table 4—24 (Cont.) LAN VC Closures Data 


Data Description 


Topology Change Number of times the VC was closed because PEDRIVER 
performed a failover from a LAN path (or paths) with a large 
packet size to a LAN path with a smaller packet size. 


CC DFQ Empty Number of times the VC was closed because the channel control 
data-free queue (DFQ) was empty. 

NPAGEDYN Low Number of times the VC was closed because of a nonpaged pool 
allocation failure in the local node. 

LAN Xmt TMO Number of times the VC was closed because the LAN device 


used to send the packet did not report transmit completion 
before the packet’s transmit timeout limit was exceeded. 


4.3.4.7 LAN VC Packets Discarded Data 


LAN VC packets discarded data is information about the number of times packets 
were discarded for a particular reason. Figure 4—25 is an example of a LAN VC 
Packets Discarded Data page. 


Figure 4-25 LAN VC Packets Discarded Data 
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Table 4—25 describes the data displayed in Figure 4-25. 


Table 4-25 LAN VC Packets Discarded Data 


Data Description 

Bad Checksum Number of times there was a checksum failure on a received 
packet. 

No Xmt Chan Number of times no transmit channel was available. 

Rev Short Msg Number of times an undersized transport packet was received. 


(continued on next page) 
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Table 4—25 (Cont.) LAN VC Packets Discarded Data 


Data 


Description 


Ill Seq Msg 


TR DFQ Empty 
TR MFQ Empty 


CC MFQ Empty 
Rev Window Miss 


Number of times an out-of-range sequence numbered packet was 
received. 


Number of times the transmit data-free queue (DFQ) was empty. 


Number of times the TR layer message-free queue (MFQ) was 
empty. 


Number of times the channel control MFQ was empty. 


Number of packets that could not be placed in the virtual 
circuit’s receive cache because the cache was full. 
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Note 


Before you start this chapter, be sure to read the explanations of data 
collection, events, thresholds, and occurrences in Chapter 1. 


The Availability Manager Data Analyzer indicates resource availability 
problems in the Event pane (Figure 5-1) of the main System Overview window 
(Figure 1-1). 


Figure 5-1 OpenVMS Event Pane 


Node Group | Date & Tim [Severity] Event | Description 
S HRDWR3 KUF SwLANci O6-Jan-2004 16:59:29.726 60 HIDIOR HRDWR3 direct I/O rate is high 

WILD6 KUF SwLANci O6-Jan-2004 16:59:32.270 60 HIDIOR WILD6 direct I/O rate is high 

XENONZ KIF SwLhANci 06-Jan-2004 16:59:32.320 60 HIDIOR XENON2 direct I/O rate is high 

XENON4 KOF SwhANci O6-Jan-2004 16:59:35.94 60 HIDIOR XENON4 direct I/O rate is high 

GRCK2 KOUF SwLhANci 06-Jan-2004 16:59:35.604 60 HIDIOR GRCKZ direct I/O rate is high 

WILDS KOF SwLANci O6-Jan-2004 16:59:42.674 60 HIDIOR WILDS direct I/O rate is high 

WILD3) KJF SwhANci 06-Jan-2004 16:59:44.948 60 HIDIOR WILD3 direct I/O rate is high 

XENON] KOF SwLhANci 06-Jan-2004 16:59:46.850 60 HIDIOR XENON1 direct I/O rate is high 

XENON1 KUF SwLANci 06-Jan-2004 17:26:59.88 60 HINTER XENON] interrupt mode time is high 

SQPEZ DECAMDS 06-Jan-2004 17:28:28.136 60 HINTER SQPE2Z interrupt mode time is high 

XENON3 KJF SwLANci O6-Jan-2004 17:29:24.767 60 HINTER XENON3 interrupt mode time is high 

WILD4 KUF SwhANci O6-Jan-2004 16:59:03.999 60 HMPSYN WILD4 MP synchronization mode time is high 

WILD6 KUF SwhANci O6-Jan-2004 16:59:22.245 60 HMPSYN WILD6 MP synchronization mode time is high 

WILDS KJF SwLhANci O6-Jan-2004 16:59:32.670 60 HMPSYN WILDS MP synchronization mode time is high 

XENONZ KOF SwLANci O6-Jan-2004 17:29:25.308 60  HMPSYN XENONZ MP synchronization mode time is high 

TARDIS TARDIS 06-Jan-2004 17:03:00.559 60 LOVLSP TARDIS TARDIS$DKC100(IOHAMMERED) disk volume free space 

AFFS10 KOINE2Z 06-Jan-2004 17:25:37.781 60 LOVLSP KOINE2Z AFFS10$DKAOQ(BLIZ) disk volume free space is low 

COWBOX DECAMDS 06-Jan-2004 17:25:42.397 60 LOVLSP DECAMDS $1$DGASO0 (WORKSTATIONS) disk volume free space 

DENALI High Peaks 06-Jan-2004 17:26:27.913 60 LOVLSP High Peaks $6$DRAZ00($6$DRAZ00) disk volume free space 

DENALI High Peaks 06-Jan-2004 17:26:27.913 60 LOVLSP High Peaks $6$DRB100($6$DRB100) disk volume free space 


Collection [High Peaks] has 2 nodes 


The Event pane helps you identify system problems. In many cases, you can 
apply fixes to correct these problems as well, as explained in Chapter 6. 


The Data Analyzer displays a warning message in the Event pane whenever 

it detects a resource availability problem. If logging is enabled (the default), 
the Data Analyzer also logs each event in the Event Log file, which you can 
display or print. (For the location of this file and a cautionary note about it, see 
Section 5.2.) 


5.1 Event Information Displayed in the Event Pane 


The Data Analyzer can display events for all nodes that are currently in 
communication with the Data Analyzer. When an event of a certain severity 
occurs, the Data Analyzer adds the event to a list in the Event pane. 
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The length of time an event is displayed depends on the severity of the event. 
Less severe events are displayed for a short period of time (30 seconds); more 
severe events are displayed until you explicitly remove the event from the Event 
pane (explained in Event Pane Menu Options). 


Data in the Event Pane 


Table 5-1 provides additional information about the data items that are displayed 
in the Event pane. 


Table 5-1 Event Pane Data 


Data Item Description 

Node Name of the node causing the event 

Group Group of the node causing the event 

Date Date the event occurred 

Time Time that an event was detected 

Sev Severity: a value from 0 to 100. (You can customize this value 


to indicate the importance of the event, with 100 as the most 
important.) 


Event Alphanumeric identifier of the type of event 


Description Short description of the resource availability problem 


Appendix B contains tables of events that are displayed in the Event pane. In 
addition, these tables contain an explanation of each event and the recommended 
remedial action. 


Event Pane Menu Options 


When you right-click a node name or data item in the Event pane, the Data 
Analyzer displays a shortcut menu with the following options: 


Menu Option Description 

Display Displays the Node Summary page associated with that event. 
Remove Removes an event from the display. 

Freeze/Unfreeze Freezes a value in the display until you “unfreeze” it; a snowflake 


icon is displayed to the left of an event that is frozen. 


Customize Allows you to customize events. 


5.2 Criteria for Evaluating an Event 


During data collection, any time data meets or exceeds the threshold for an 
event, an occurrence counter is incremented. When the incremented value 
matches the value in the Occurrence box on the Event Customization page 
(Figure 5-2), the event is posted in the Event pane of the System Overview 
window (Figure 1-1). 
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Figure 5-2 Sample Event Customization 


Customization - Open¥MS Default Settings x| 


| DSKERR, high disk device errorcount ow | Use default values | 


Event Customizations 


Severity 60 [=] occurrence 7 fe] 
Threshold 15 2) Errors) 


Escalation actions: [| User [vi OPCOM [_j HP OpenView 


User Action 


Windows™ procedure 


Event explanation and investigation hints 
The error count for the disk device exceeds the threshold. 
Check error log entries for device errors. A disk device with a high 


error count could indicate a problem with the disk or with the 
connection between the disk and the system. 


& Global OpenVMS [ok | cancel | Apply | Help 


The sample Event Customization page indicates a threshold of 15 errors and 
an occurrence value of 2. This means that if the DSKERR event exceeds its 


threshold of 15 for two consecutive data collections, the DSKERR event is posted 
in the Event pane. 


Note that some events are triggered when data is lower than the threshold; other 
events are triggered when data is higher than the threshold. 


If, at any time during data collection, the data does not meet or exceed the 
threshold, the occurrence counter is set to zero, and the event is removed from 
the Event pane. Figure 5—3 depicts this sequence. 


Getting Information About Events 5-3 


Getting Information About Events 
5.2 Criteria for Evaluating an Event 


Figure 5-3 Testing for Events 
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5.3 Criteria for Posting and Displaying an Event 
When an event is posted, the following actions occur: 
e The event is displayed in the Event pane. 


e The data associated with the event is collected at the Event interval shown 
on the Data Collection Customization page (Figure 5—4). In this example, the 
event is associated with the Disk Status data collection. 
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Figure 5-4 OpenVMS Data Collection Customization 


Customization - Open¥MS Default Settings xi 


ee Collection Fitter | Security | 
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Display NoEvent 
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Icons are used to indicate the current customization level in effect. 
Indicates the current settings are from the Availability Manager 
built-in set. 
¢ Indicates the current settings are from the Application level 


© Global OpenVMS [ ok || cancer | Apply | Help | 


On the Data Collection Customization page, for example, the Event interval 
for Disk Status data collection is every 15 seconds. 


Figure 5-5 OpenVMS Group/Node Pane 


GroupsiNodes #CPUs| CPU DIO [cPuQs| Events| Proc Ct _|_OS Version HV Model Dc 
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& rusy3 4/4 0 5 i) 00 0 0 0 0 231622 V8.3 HP rx3600 (1.59GHz/9.0MB) 1648 
& retro : - | E - - __- v7.1 __VAXstation 4000-90 vax oof fy 
< il [> 


When an event is posted, the following actions also occur: 


e The Events field in the Group/Node pane is incremented, and the node icon 
in the Node Name field turns red (see Figure 5-5). You can see the events 
posted for this node in a tooltip by placing the mouse over the Node Name. 


e When an event is posted, it is added to the Event Log file by default: 
— On OpenVMS systems, the Event Log file is: 
AMDSSAM_LOG: ANALYZEREVENTS_CONNi_yyyymmdd-hhmm. LOG 


The i in the file is an integer indicating the connection in the Data 
Analyzer. The other small letters indicate the date and time the log file 
was created. 
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— On Windows systems, the Event Log file name has the same format, and 
is located in the installation directory. 


The Event Log consists of the following fields: 


Table 5-2 Event Columns 


Event Column Description 

Group name AMDS Group name 

Node Node name for the OpenVMS system 

Date/Time The date and time for the Event Log entry 

Severity Severity of the event 

Event Alphanumeric event identifier 

EventKey A hex value identifying an event for a node. For instance, all HINTER 


events for a node have the same value. Each time the HINTER event 
is signaled for a node, the value will be the same, making it easy to 
search for all the HINTER events for a node. 


EventID A hex value identifying an individual event. For instance, if 
the HICOMQ event on node SAM is signaled, the BEGIN and 
END/CANCELD/EXPIRED entries that mark when the event was 
signaled and cancelled will have the same value. The next time 
the HICOMQ event is signaled on node SAM, the hex value will be 
different. This value makes it easy to find the entry that signals when 
the event has been cancelled. 


Status The value describes the status of the event. Values are as follows: 


Status Value Description 


INFO This event is informational. 

BEGIN The event entry marks the beginning of the interval 
when the values for an event have exceeded the 
threshold. 

END The event entry marks the end of the interval when 


the values for an event have exceeded the threshold. 


CANCELD The event entry marks when the event was removed 
because the data used to evaluate the event is now 
longer being collected. 


EXPIRED The event entry marks when the event has exprired. 


Description Event description 


Caution About Event Logs 


If you collect data on many nodes, running the Data Analyzer for a long 
period of time can result in a large event log. For example, in a run that 
monitors more than 50 nodes with most of the background data collection 
enabled, the event log can grow by up to 30 MB per day. At this rate, 
systems with small disks might fill up the disk on which the event log 
resides. 


Closing the Data Analyzer application allows you to access the event log 
for tasks such as archiving. Starting the Data Analyzer starts a new 
event log. 
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5.4 Displaying Additional Event Information 


For more detailed information about a specific event, double-click any event 
data item in the Event pane. The Data Analyzer first displays a data page that 
most closely corresponds to the cause of the event. You can choose other tabs for 
additional detailed information. 


For a description of data pages and the information they contain, see Chapter 3. 
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Performing Fixes on OpenVMS Nodes 


Fixes allow you to resolve resource availability problems and improve system 
availability. 


This chapter discusses the following topics: 
e Understanding fixes 


e Performing fixes 


Caution 


Performing certain fixes can have serious repercussions, including 
possible system failure. Therefore, only experienced system managers 
should perform fixes. 


6.1 Understanding Fixes 


When you suspect or detect a resource availability problem, in many cases you 
can use the Availability Manager Data Analyzer to analyze the problem and to 
perform a fix to improve the situation. 


Data Analyzer fixes fall into the following categories: 


e Node fixes 
e Process fixes 
e ©6Disk fixes 


e Cluster interconnect fixes 


You can access fixes, by category, from the pages listed in Table 6-1. 


Table 6-1 Accessing Availability Manager Fixes 


Fix Category and Name Available from This Page 
Node fixes: Node Summary 
CPU 
Crash Node Memory Summary 
Adjust Quorum I/O Process 
SCA Port 
SCA Circuit 


LAN Virtual Circuit 
LAN Path (Channel) 
LAN Device 


(continued on next page) 
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Table 6-1 (Cont.) Accessing Availability Manager Fixes 


Fix Category and Name 


Available from This Page 


Process fixes: 
General process fixes: 


Delete Process 
Exit Image 
Suspend Process 
Resume Process 
Process Priority 


Process memory fixes: 


Purge Working Set (WS) 
Adjust Working Set (WS) 


Process limits fixes: 


Direct I/O 
Buffered I/O 
AST 

Open file 

Lock 

Timer 
Subprocess 
I/O Byte 
Pagefile Quota 


Disk fixes: 


Cancel disk MV 
Cancel SSM MV 


Cluster interconnect fixes: 


- SCA Port:/ Adjust Priority 


- SCA Circuit:/ Adjust Priority 


LAN Virtual Circuit Summary: 


Maximum Transmit Window Size 
Maximum Receive Window Size 
Checksumming 

Compression 

ECS Maximum Delay 


6-2 Performing Fixes on OpenVMS Nodes 


All of the process fixes are available from the 
following pages: 


Memory Summary 
I/O Process 

CPU Process 
Single Process 


All of the disk fixes are available from the 
following pages: 


Disk Status Summary 
Disk Volume Summary 


These fixes are available from the following 
lines of data on the Cluster Summary page 
(Figure 4-1): 


Right-click a data item on the Local Port Data 


display line to display a menu. Then select 
Port Fix.... 


Right-click a data item on the Circuits Data 
display line to display a menu. Then select 
Circuit Fix.... 

Right-click a data item on the LAN Virtual 
Circuit Summary line to display a menu. 
Then select VC LAN Fix.... Alternatively, you 
can use the Fix menu on the LAN VC Details 
page. 


(continued on next page) 
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Table 6-1 (Cont.) Accessing Availability Manager Fixes 


Fix Category and Name 


Available from This Page 


LAN Path (Channel) Summary: 


Adjust Priority 


Hops 


LAN Device Details: 


Adjust Priority 

Set Maximum Buffer Size 
Start LAN Device 

Stop LAN Device 


Right-click a data item on the LAN Path 
(Channel) Summary line to display a menu. 
Then select Fixes.... Alternatively, you can 
use the Fix menu on the Channel Details 
page. 

You can access these fixes in the following 
ways: 


e = Right-click an item in the LAN Path 
(Channel) Summary category to display 
a menu. Then select LAN Device 
Details... to display pages containing 
Fix options. 


e = Right-click an item in the LAN Device 
Summary page and then select LAN 
Device Fixes..... 


e Select Fixes... on the LAN Device Details 
page. 


Table 6-2 summarizes various problems, recommended fixes, and the expected 


results of fixes. 


Table 6-2 Summary of Problems and Matching Fixes 


Problem 


Fix 


Result 


Node resource hanging cluster 


Cluster hung 
Process looping, intruder 


Endless process loop in same PC 
range 


Runaway process, unwelcome 
intruder 


Process previously suspended 


Runaway process or process that 
is overconsuming 


Low node memory 
Working set too high or low 


Process quota has reached its 
limit and has entered RWAIT 
state 


Process has exhausted its 
pagefile quota 


Crash Node 


Adjust Quorum 
Delete Process 


Exit Image 
Suspend Process 


Resume Process 


Process Priority 


Purge Working Set 
(WS) 


Adjust Working Set 
(WS) 


Adjust Process 
Limits 


Adjust Pagefile 
Quota 


Node fails with operator-requested shutdown. 
See Section 6.2.2 for the crash dump footprint 
for this type of shutdown. 


Quorum for cluster is adjusted. 
Process no longer exists. 


Exits from current image. 
Process is suspended from execution. 


Process starts from point it was suspended. 


Base priority changes to selected setting. 


Frees memory on node; page faulting might 
occur for process affected. 


Removes unused pages from working set; page 
faulting might occur. 


Process limit is increased, which in many cases 
frees the process to continue execution. 


Pagefile quota limit of the process is adjusted. 


(continued on next page) 
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Table 6-2 (Cont.) Summary of Problems and Matching Fixes 


Result 


Problem Fix 
Disk volume is in mount verify Cancel disk MV 
state 


Shadow set is in mount verify Cancel SSM MV 
state due to a shadow set 

member being in a mount verify 

state 


Disk volume is taking out of the mount verify 
state and put into the mount verify timeout 
state. The disk can now be dismounted with 
the $ DISMOUNT/ABORT command. 


The shadow set member is ejected from the 
shadow set, enabling the shadow set to return 
to a mounted state. This is equivalent to $ SET 
SHADOW/FORCE_REMOVAL command. 


Most process fixes correspond to an OpenVMS system service call, as shown in 


the following table: 
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Process Fix System Service Call 
Delete Process $DELPRC 

Exit Image $FORCEX 

Suspend Process $SUSPND 

Resume Process $RESUME 

Process Priority $SETPRI 

Purge Working Set (WS) $PURGWS 

Adjust Working Set (WS) $ADJWSL 

Adjust process limits of the following: None 


Direct I/O (DIO) 

Buffered I/O (BIO) 
Asynchronous system trap (AST) 
Open file (FIL) 

Lock queue (ENQ) 

Timer queue entry (TQE) 
Subprocess (PRC) 

I/O byte (BYT) 


Note 


Each fix that uses a system service call requires that the process execute 
the system service. A hung process has the fix queued to it, and the fix 
does not execute until the process is operational again. 


Be aware of the following facts before you perform a fix: 


You must have write access to perform a fix. To perform LAN fixes, you must 
have control access. 


You cannot undo many fixes. For example, after using the Crash Node fix, the 
node must be rebooted (either by the node if the node reboots automatically, 
or by a person performing a manual boot). 


Do not apply the Exit Image, Delete Process, or Suspend Process fix to system 
processes. Doing so might require you to reboot the node. 


Whenever you exit an image, you cannot return to that image. 
You cannot delete processes that have exceeded their job or process quota. 


The Availability Manager Data Collector ignores fixes applied to the 
SWAPPER process. 


How to Perform Fixes 


Standard OpenVMS privileges restrict users’ write access. When you run the 
Data Analyzer, you must have the CMKRNL privilege to send a write (fix) 
instruction to a node with a problem. 


The following options are displayed at the bottom of all fix pages: 
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Option Description 

OK Applies the fix and then exits the page. Any message associated with 
the fix is displayed in the Event pane. 

Cancel Cancels the fix. 

Apply Applies the fix and does not exit the page. Any message associated 


with the fix is displayed in the Return Status section of the page and 
in the Event pane. 


The following sections explain how to perform node, process and disk fixes. 


Note 


Node, process and disk fixes generate an event when they are executed. 
The events are entered into the event log on the system that is running 
the Data Analyzer. See the "Events generated by fixes" section in 
Table C-—2 for a list of these events. 


6.2 Performing Node Fixes 
Node fixes fall into the following categories: 
e Fixes that allow you to deliberately fail (or crash) a node 
eA fix that allows you to adjust cluster quorum 
To perform a node fix, follow these steps: 
1. On the Node Summary, CPU, Memory, or I/O page, select the Fix menu. 
2. Select Fix Options. 
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6.2.1 Adjust Quorum 


The default node fix displayed is the Adjust Quorum fix, which forces a node to 

recalculate the quorum value. This fix is the equivalent of the Interrupt Priority 
level C (IPC) mechanism used at system consoles for the same purpose. The fix 

forces the adjustment for the entire cluster so that each node in the cluster has 

the same new quorum value. 


The Adjust Quorum fix is useful when the number of votes in a cluster falls below 
the quorum set for that cluster. This fix allows you to readjust the quorum so 
that it corresponds to the current number of votes in the cluster. 


The Adjust Quorum page is shown in Figure 6-1. 
Figure 6-1 Adjust Quorum 


fpvabiity Manager Foes 


Node Name: FXEF80 


Fix Type Explanation 
Adjusts cluster quorum 
Adiust Quorum This fix will cause the cluster to recalculate 
the cluster quorum. This fix will allow a 


cluster that is hung because it has lost 
quorum to regain quorum and resume 
operation. 


Return Status 


| OK | Cancel | Apply | Help 
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6.2.2 Crash Node 
Caution 


The Crash Node fix is an operator-requested bugcheck from the Data 
Collector. It takes place as soon as you click OK in the Crash Node fix. 
After you perform this fix, the node cannot be restored to its previous 
state. After a crash, the node must be rebooted. 


When you select the Crash Node option, the Data Analyzer displays the Crash 
Node page, shown in Figure 6-2. 


Figure 6-2 Crash Node 


Availability Manager Fixes Bi xi 


{ Node | 


Node Name: QTV18 


Fix Type Explanation 
Crashes the node 
Crash Node = This fix will attempt to crash the node. A 
successful return status means that the 


connection to the node has been severed. 


CAUTION 
Use as a last resort only! 


Return Status 


| OK | Cancel | Apply | Helt 


Note 


Because the node cannot report a confirmation when a Crash Node fix 
is successful, the crash success message is displayed after the timeout 
period for the fix confirmation has expired. 


Recognizing a System Failure Forced by the Availability Manager 

Because a user with suitable privileges can force a node to fail from the Data 
Analyzer by using the Crash Node fix, system managers have requested a method 
for recognizing these particular failure footprints so that they can distinguish 
them from other failures. These failures all have identical footprints: they are 
operator-induced system failures in kernel mode at IPL 8. The top of the kernel 
stack is similar the following display: 
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SP => Quadword system address 
Quadword data 
1BEQDEAD. 00000000 
00000000.00000000 
Quadword data TRAPSCRASH 
Quadword data SYSSRMDRIVER + offset 


6.3 Performing Process Fixes 


. Process fixes fall into the following categories: 


Fixes that allow you to affect the process. For instance, change its priority, 
suspend it, or resume it 


A fix that allows you to adjust the memory of a process 


A fix that allows you to adjust the quotas or limits of of a process 


To perform a process fix, follow these steps: 


1. 
2. 


On the Memory or I/O page, right-click a process name. 
Click Fix Options. 
The Data Analyzer displays these Process tabs: 


Process General 
Process Memory 
Process Limits 


Click one of these tabs to bring it to the front. 


Click the down arrow to display the process fixes in this group, as shown in 
Figure 6-3, where the Process General tab has been chosen. 


Figure 6-3 Process General Options 


Availability Manager Fixes } | 
Process General 


Node Name: MONSON Process: DNS$ADVER (0000004C) 


Fix Type Explanation 


Changes the base priority of the process 
_ Process Priorit | 


Fix Value 
4 


Return Status 


OK | Cancel | Apply | 


5. Select a process fix (for example, Process Priority, shown in Figure 6-3), to 


display a fix page. 
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Some of the fixes, such as Process Priority, require you to use a slider to change 
the default value. When you finish setting a new process priority, click Apply at 
the bottom of the page to apply that fix. 
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6.3.1 General Process Fixes 


The following sections describe Data Analyzer general process fixes. These fixes 
include instructions telling how to delete, suspend, and resume a process. 


6.3.1.1 Delete Process 


In most cases, a Delete Process fix deletes a process. However, if a process is 
waiting for disk I/O or is in a resource wait state (RWAST), this fix might not 
delete the process. In this situation, it is useless to repeat the fix. Instead, 
depending on the resource the process is waiting for, a Process Limit fix might 
free the process. As a last resort, reboot the node to delete the process. 


Caution 


Deleting a system process can cause the system to hang or become 
unstable. 


When you select the Delete Process option, the Data Analyzer displays the page 
shown in Figure 6-4. 


Figure 6-4 Delete Process 


availabilty ManagerFines 
(Mode | Process General | Process Memory | Process Limits | | 
Node Name: MONSON Process: DNS$ADVER (0000004C) 


Fix Type Explanation 


Deletes the selected process 
_Delete Process This fix will cause the process to be deleted. 
However, ifthe process is hung because it 
has exhausted a process resource limit, this 
fix may not be able to delete the process. 
First adjust the process resource limit with a 
Process Limit fix. 


Return Status 


| OK | Cancel | Apply | Help 


After reading the explanation, click Apply at the bottom of the page to apply the 
fix. A message displayed on the page indicates that the fix has been successful. 
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6.3.1.2 Exit Image 


Exiting an image on a node can stop an application that a user requires. Make 
sure you check the Single Process page before you exit an image to determine 
which image is running on the node. 


Caution 


Exiting an image on a system process could cause the system to hang or 
become unstable. 


When you select the Exit Image option, the Data Analyzer displays the page 
shown in Figure 6-5. 


Figure 6-5 Exit Image Page 


Availability Manager Fixes 


(Node | Process General |( Process Memory | Process Limits | 
Node Name: MONSON Process: DNS$ADVER (0000004C) 


Fix Type Explanation 


Forces the image of the process to exit 


Return Status 


OK | Cancel | Apply | Help 


After reading the explanation in the page, click Apply at the bottom of the page 
to apply the fix. A message displayed on the page indicates that the fix has been 
successful. 
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6.3.1.3 Suspend Process 
Suspending a process that is consuming excess CPU time can improve perceived 
CPU performance on the node by freeing the CPU for other processes to use. 
(Conversely, resuming a process that was using excess CPU time while running 
might reduce perceived CPU performance on the node.) 


Caution 


Do not suspend system processes, especially JOB_CONTROL, because 
this might make your system unusable. (For more information, see HP 
OpenVMS Programming Concepts Manual, Volume I.) 


When you select the Suspend Process option, the Data Analyzer displays the page 
shown in Figure 6-6. 


Figure 6-6 Suspend Process 


(Woe | Process General |( Process Memory | Process Limits | 
Node Name: MONSON Process: DNS$ADVER (0000004C) 


Fix Type Explanation 


Suspends the process 
This fix is equivalentto $ SET 
PROCESS/SUSPEND. 


Return Status 


[ok | cancel | Apply | Help 


After reading the explanation, click Apply at the bottom of the page to apply the 
fix. A message displayed on the page indicates that the fix has been successful. 
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6.3.1.4 Resume Process 


Resuming a process that was using excess CPU time while running might reduce 
perceived CPU performance on the node. (Conversely, suspending a process 
that is consuming excess CPU time can improve perceived CPU performance by 
freeing the CPU for other processes to use.) 


When you select the Resume Process option, the Data Analyzer displays the page 
shown in Figure 6-7. 


Figure 6-7 Resume Process 


Availability Manager Fixes i 
(Mawes ] Process General | Process Memoty |( Provess|Limits | 
Node Name: MONSON Process: DNS$ADVER (0000004C) 


Fix Type Explanation 


Resumes the process 


This fixis equivalentto $ SET 
PROCESS/RESUME. 


Return Status 


| ok | cancel | Apply _| 


After reading the explanation, click Apply at the bottom of the page to apply the 
fix. A message displayed on the page indicates that the fix has been successful. 
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6.3.1.5 Process Priority 


If the priority of a compute-bound process is too high, the process can consume 
all the CPU cycles on the node, affecting performance dramatically. On the other 
hand, if the priority of a process is too low, the process might not obtain enough 
CPU cycles to do its job, also affecting performance. 


When you select the Process Priority option, the Data Analyzer displays the page 
shown in Figure 6-8. 


Figure 6-8 Process Priority 


availabilty ManagerFixes 
(Mowe) Process General | Process Memoty | |( Provess|Limits | 
Node Name: MONSON Process: DNS$ADVER (0000004C) 


Fix Type Explanation 


= Changes the base priority of the process 
_Process Priority ¥ 


Fix Value 
4 


Return Status 


OK | Cancel | Apply | 


To change the base priority for a process, drag the slider on the scale to the 
number you want. The current priority number is displayed in a small box above 
the slider. You can also click the line above or below the slider to adjust the 
number by 1. 


When you are satisfied with the new base priority, click Apply at the bottom of 
the page to apply the fix. A message displayed on the page indicates that the fix 
has been successful. 
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6.3.2 Process Memory Fixes 


The following sections describe the Availability Manager fixes you can use to 
correct process memory problems— Purge Working Set and Adjust Working Set 
fixes. 


6.3.2.1 Purge Working Set 


This fix purges the working set to a minimal size. You can use this fix to reclaim 
a process’s pages that are not in active use. If the process is in a wait state, the 
working set remains at a minimal size, and the purged pages become available 
for other uses. If the process becomes active, pages the process needs are page- 
faulted back into memory, and the unneeded pages are available for other uses. 


Be careful not to repeat this fix too often: a process that continually reclaims 
needed pages can cause excessive page faulting, which can affect system 
performance. 


When you select the Purge Working Set option, the Data Analyzer displays the 
page shown in Figure 6-9. 


Figure 6-9 Purge Working Set 


Availability Manager Fixes fr, xi 
(Node | ProvessGeneral | Process Memory | Process Limits "| 


Node Name: ALMOST Process: DECWS$SERVER_O (00000066) 


Fix Type Explanation 
Purges the working set of the process 
Purge WS = This fix reduces the working set of the 
process to a minimal size. 


Return Status 


OK | Cancel | Apply | 


After reading the explanation on the page, click Apply at the bottom of the page 
to apply the fix. A message displayed on the page indicates that the fix has been 
successful. 
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6.3.2.2 Adjust Working Set 


Adjusting the working set of a process might prove to be useful in a variety of 
situations. Two of these situations are described in the following list. 


e Ifa process is page-faulting because of insufficient memory, you can reclaim 
unused memory from other processes by decreasing the working set of one or 
more of them. 


e Ifa process is page-faulting too frequently because its working set is too 
small, you can increase its working set. 


Caution 


If the automatic working set adjustment is enabled for the system, a 
fix to adjust the working set size disables the automatic adjustment for 
the process. For more information, see OpenVMS online help for SET 
WORKING_SET/ADJUST, which includes /NOADJUST. 


When you select the Adjust Working Set fix, the Data Analyzer displays the page 
shown in Figure 6-10. 


Figure 6-10 Adjust Working Set 


(Note |(Protess'Genietal |] Process Memory | Prowess|Limits) | 
Node Name: ALMOST Process: DECWS$SERVER_O (00000066) 


Fix Type Explanation 


Adjusts the working set size of a process 
There are two caveats for this fix: 
This fix disables the automatic working set 
Fix Value adjustment for the process. 

7904 The adjusted working set value cannot 
exceed 


WSQUOTA for the process or WWSMAX for the 
system. 
Memory is represented in 512 byte units. 


Return Status 


OK | Cancel | Apply | 


To perform this fix, use the slider to adjust the working set to the limit you want. 
You can also click the line above or below the slider to adjust the number by 1. 


When you are satisfied with the new working set limit, click Apply at the bottom 
of the page to apply the fix. A message displayed on the page indicates that the 
fix has been successful. 
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6.3.3 Process Limits Fixes 


If a process is waiting for a resource, you can use a Process Limits fix to increase 
the resource limit so that the process can continue. The increased limit is in 
effect only for the life of the process, however; any new process is assigned the 
quota that was set in the UAF. 


When you click the Process Limits tab, you can select any of the following options: 


Direct I/O 
Buffered I/O 
AST 

Open File 
Lock 

Timer 
Subprocess 
I/O Byte 
Pagefile Quota 


These fix options are described in the following sections. 


6.3.3.1. Direct I/O Count Limit 


You can use this fix to adjust the direct I/O count limit of a process. When 
you select the Direct I/O option, the Data Analyzer displays the page shown in 
Figure 6-11. 


Figure 6-11 Direct I/O Count Limit 


Availability Manager Fixes a xi 
Process Limits | 


Node Name: ALMOST Process: DECWS$SERVER_O (00000066) 


Fix Type Explanation 
Adjusts the Direct 0 count limit of the 
Direct WO v process 


Fix Value 


Return Status 


OK | Cancel | Apply | 


To perform this fix, use the slider to adjust the direct I/O count to the limit you 
want. You can also click the line above or below the slider to adjust the number 
by 1. 


When you are satisfied with the new direct I/O count limit, click Apply at the 
bottom of the page to apply the fix. A message displayed on the page indicates 
that the fix has been successful. 
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6.3.3.2 Buffered I/O Count Limit 


You can use this fix to adjust the buffered I/O count limit of a process. When 
you select the Buffered I/O option, the Data Analyzer displays the page shown in 
Figure 6-12. 


Figure 6-12 Buffered I/O Count Limit 


Availability Manager Fixes xi 
Process Limits | 


Node Name: ALMOST Process: DECWS$SERVER_O (00000066) 


Fix Type Explanation 


—_ Adjusts the Buffered 1/0 count limit of the 
Buffered WO ¥ process 


Fix Value 


100, 


Return Status 


| OK | Cancel | Apply | Help 


To perform this fix, use the slider to adjust the buffered I/O count to the limit you 
want. You can also click the line above or below the slider to adjust the number 
by 1. 


When you are satisfied with the new buffered I/O count limit, click Apply at the 
bottom of the page to apply the fix. A message displayed on the page indicates 
that the fix has been successful. 
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6.3.3.3 AST Queue Limit 


You can use this fix to adjust the AST queue limit of a process. When you select 
the AST option, the Data Analyzer displays a page similar to the one shown in 
Figure 6-13. 


Figure 6-13 AST Queue Limit 


Availability Manager Fixes xi 
Process Limits 


Node Name: ALMOST Process: DECWS$SERVER_O (00000066) 


Fix Type Explanation 
oe Adjusts the AST Queue limit of the process 
Last] 


Fix Value 
100, 


Return Status 


OK | Cancel | Apply | Help 


To perform this fix, use the slider to adjust the AST queue limit to the number 
you want. You can also click the line above or below the slider to adjust the 
number by 1. 


When you are satisfied with the new AST queue limit, click Apply at the bottom 
of the page to apply the fix. A message displayed on the page indicates that the 
fix has been successful. 
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6.3.3.4 Open File Limit 


You can use this fix to adjust the open file limit of a process. When you select the 
Open File option, the Data Analyzer displays a page similar to the one shown in 
Figure 6-14. 


Figure 6-14 Open File Limit 


Availability Manager Fixes Xx! 
Process Limits | 


Node Name: ALMOST Process: DECWS$SERVER_O (00000066) 


Fix Type Explanation 
a Adjusts the Open File limit of the process 
_OpenFile ¥ 


Fix Value 
200, 


Return Status 


OK | Cancel | Apply | Help 


To perform this fix, use the slider to adjust the open file limit to the number you 
want. You can also click the line above or below the slider to adjust the number 
by 1. 


When you are satisfied with the new open file limit, click Apply at the bottom of 
the page to apply the fix. A message displayed on the page indicates that the fix 
has been successful. 
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6.3.3.5 Lock Queue Limit 


You can use this fix to adjust the lock queue limit of a process. When you select 
the Lock option, the Data Analyzer displays a page that is similar to the one 
shown in Figure 6-15. 


Figure 6-15 Lock Queue Limit 


Availability Manager Fixes xi 
Process Limits 


Node Name: ALMOST Process: DECWS$SERVER_O (00000066) 


Fix Type Explanation 
Adjusts the Lock Queue limit of the process 
lock 


Fix Value 
512, 


Return Status 


OK | Cancel | Apply | Help 


To perform this fix, use the slider to adjust the lock queue limit to the number you 
want. You can also click the line above or below the slider to adjust the number 
by 1. 


When you are satisfied with the new lock queue limit, click Apply at the bottom 
of the page to apply the fix. A message displayed on the page indicates that the 
fix has been successful. 
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6.3.3.6 Timer Queue Entry Limit 


You can use this fix to adjust the timer queue entry limit of a process. When 
you select the Timer option, the Data Analyzer displays the page shown in 
Figure 6-16. 


Figure 6-16 Timer Queue Entry Limit 


Availability Manager Fixes xi 
Process Limits | 


Node Name: ALMOST Process: DECWS$SERVER_O (00000066) 


Fix Type Explanation 
Adjusts the Timer Queue entry limit of the 
Timer process 


Fix Value 
8) 


Return Status 


| OK | Cancel | Apply | Help 


To perform this fix, use the slider to adjust the timer queue entry limit to the 
number you want. You can also click the line above or below the slider to adjust 
the number by 1. 


When you are satisfied with the new timer queue entry limit, click Apply at the 
bottom of the page to apply the fix. A message displayed on the page indicates 
that the fix has been successful. 


Performing Fixes on OpenVMS Nodes 6-23 


Performing Fixes on OpenVMS Nodes 
6.3 Performing Process Fixes 


6.3.3.7 Subprocess Creation Limit 


You can use this fix to adjust the creation limit of the subprocess of a process. 
When you select the Subprocess option, the Data Analyzer displays the page 
shown in Figure 6-17. 


Figure 6-17 Subprocess Creation Limit 


Availability Manager Fixes xi 
Process Limits | 


Node Name: ALMOST Process: DECWS$SERVER_O (00000066) 


Fix Type Explanation 


Adjusts the Subprocess Creation limit of the 
process 


Return Status 


| OK | Cancel | Apply | Help 


To perform this fix, use the slider to adjust the subprocess creation limit of a 
process to the number you want. You can also click the line above or below the 
slider to adjust the number by 1. 


When you are satisfied with the new subprocess creation limit, click Apply at the 
bottom of the page to apply the fix. A message displayed on the page indicates 
that the fix has been successful. 
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6.3.3.8 V/O Byte 


You can use this fix to adjust the I/O byte limit of a process. When you select the 
I/O Byte option on the movable bar, the Data Analyzer displays a page similar to 
the one shown in Figure 6-18. 


Availability Manager Fixes xi 
Process Limits 


Node Name: ALMOST Process: DECWS$SERVER_O (00000066) 


Fix Type Explanation 
Adjusts the Buffered /0 Byte limit of the 
VO Byte Se process 
The value entered will be rounded up to the 
Fix Value next 64 byte boundary. 
50000) 


100000 


Return Status 


E> | soo00 


| 


To perform this fix, use the slider to adjust the I/O byte limit to the number you 
want. You can also click the line above or below the slider to adjust the number 
by 1. 


When you are satisfied with the new I/O byte limit, click Apply at the bottom of 
the page to apply the fix. A message displayed on the page indicates that the fix 
has been successful. 


OK | Cancel | Apply | Help 
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6.3.3.9 Pagefile Quota 


You can use this fix to adjust the pagefile quota limit of a process. This quota 

is share among all the processes in a job and is measured in pagelets (512 byte 
pages). When you select the Pagefile Quota option, the Data Analyzer displays 
the page shown in Figure 6-19. 


Figure 6-19 Pagefile Quota 


Availability Manager Fixes OF | 
Process Limits | 


Node Name: ANDA1A Process: DNS$ADVER (27800416) 


Fix Type Explanation 
————— = Adjusts the Pagefile quota limit of the 
| Pagefile Quota v) process 


Fix Value 


3125) 


Return Status 


| 0K || Cancel || Apply [[ Help | 


To perform this fix, use the slider to adjust the pagefile quota limit to the number 
you want. You can also click above or below the slider to adjust the fix value 

by 1 on VAX systems, or by the number of pagelets in a page for Alpha and 164 
systems. 


When you are satisfied with the new pagefile quota limit, click Apply at the 
bottom of the page to apply the fix. A message displayed on the page indicates 
that the fix has been successful. 
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6.4 Performing Disk Fixes 
Disk fixes fall into the following categories: 
e Forcing a disk volume out of a mount verify state 


e Forcing a shadow set member out of a shadow set, allowing the shadow set to 
come out of a mount verify state and resume normal operations 


To perform a node fix, follow these steps: 


1. On the Disk Status Summary or Disk Volume Summary page, select the Fix 
menu. 


2. Select Fix Options. 
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6.4.1 Cancel Disk Volume Mount Verification 


The default disk fix displayed is the Cancel Disk Mount Verification (MV) fix, 

which forces a disk volume that is in a mount verify state into a mount verify 
timeout state. This fix is the equivalent of the Interrupt Priority level C (IPC) 
mechanism used at system consoles for the same purpose. 


The Cancel Disk Mount Verification (MV) fix is useful where disk volumes 
are mounted cluster-wide, and the host node for the disk volume fails. Once 
this fix is used on a disk volume, the disk then can be dismounted with a $ 
DISMOUNT/ABORT command. 


The Cancel Disk MV page is shown in Figure 6-20. 


Figure 6-20 Cancel Disk MV 


Availability Manager Fixes 


(Node | Disk | 


Node Name: AMI64 Disk KOINE3$DKA200: 


Fix Type Explanation 
Cancel mount verify for this disk 
Cancel DiskMV¥ |v This fix atternpts to cancel mount verification 
for this disk and put the disk into the mount 


verify timeout state. 


Return Status 


OK | Cancel | Apply Helt 


After reading the explanation on the page, click Apply at the bottom of the page 


to apply the fix. A message displayed on the page indicates that the fix has been 
successful. 
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6.4.2 Cancel Shadow Set Mount Verification 


The Cancel Shadow Set Mount Verification (SSM MV) fix forces the ejection of 
an unavailable shadow set member from a shadow set that is in a mount verify 
state. 


The Cancel SSM MV fix is useful to regain use of a shadow set that is in a mount 
verify state because a shadow set member resides on a host node that has failed. 
This is especially useful where the shadow set contains the System Authorization 
file, and having the shadow set in a mount verify state prevents logins to the 
node or cluster. 


This fix is the equivalent to the $ SET SHADOW/FORCE_REMOVAL command. 
The Cancel SSM MV page is shown in Figure 6-21. 


Figure 6-21 Cancel SSM MV 


Availability Manager Fixes 


(Node | Disk | 


Node Name: AMI64 Disk KOINE3$DKA200: 


Fix Type Explanation 
Cancel mount verify for this shadow set 
Cancel SSMMY (iw member 
This fix attempts to cancel mount verification 


for this shadow set member, and eject it 
from the shadow set. This is equivalent to 
the $ SET SHADOW /F ORCE_REMOVAL 
command. 


Return Status 


OK | Cancel | Apply Helt 


After reading the explanation on the page, click Apply at the bottom of the page 
to apply the fix. A message displayed on the page indicates that the fix has been 
successful. 
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6.5 Performing Cluster Interconnect Fixes 
Note 


All cluster interconnect fixes require that managed objects be enabled. 


The following are categories of cluster interconnect fixes: 
e Port adjust priority fix 

e Circuit adjust priority fix 

e LAN virtual circuit (VC) summary fixes 

e LAN channel (path) fixes 

e LAN device fixes 


The following sections describe these types of fixes. The descriptions also indicate 
whether or not the fix is currently available. 
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6.5.1 Port Adjust Priority Fix 


To access the Port Adjust Priority fix, right-click a data item in the Local Port 
Data display line (see Figure 4—3). The Data Analyzer displays a shortcut menu 
with the Port Fix option. 


This page (Figure 6-22) allows you to change the cost associated with this port, 
which, in turn, affects the routing of cluster traffic. 


Figure 6—22 Port Adjust Priority 


P avalabiity ManogerFixes 
“Node | SCA Port 


Port: EBJB27 PNAO 


Fix Type Explanation 
Adjust the management priority for the Port 
Adiust Priority Ms This fix changes the cost associated with 
this Port which in turn affects the routing of 


Fix Value cluster traffic. 
aT) 


Return Status 


| ok | cancel | Apply | 
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6.5.2 Circuit Adjust Priority Fix 


To access the Circuit Adjust Priority fix, right-click a data item in the circuits 
data display line (see Figure 4—4). The Data Analyzer displays a shortcut menu 
with the Circuit Fix option. 


This page (Figure 6-23) allows you to change the cost associated with this 
circuit, which, in turn, affects the routing of cluster traffic. In the below text 
figures 6-23 to 6-34 on a Cluster Over IP interface would be updated in the next 
Documentation update. 


Figure 6—23 Circuit Adjust Priority 


SCA Circuit | 


Circuit: EBJB2? PEAQ: to BCNU2 


Fix Type Explanation 
Adjust the management priority for the Circuit 
Adiust Priority >d This fix changes the cost associated with 
this Circuit which in turn affects the routing of 


Fix Value cluster traffic. 
io 


Return Status 


| oK | cancer | app | Hein | 
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6.5.3 LAN Virtual Circuit Fixes 


To access LAN virtual circuit fixes, right-click a data item in the LAN Virtual 
Circuit Summary category (see Figure 4-6), or use the Fix menu on the LAN 
Device Details... page. 


The Data Analyzer displays a shortcut menu with the following options: 
e Channel Summary 

e VC LAN Details... 

e VC LAN Fix... 


When you select VC LAN Fix..., the Data Analyzer displays the first of several 
fix pages. Use the Fix Type box to select one of the following LAN VC fixes: 


e Maximum Transmit Window Size 

e Maximum Receive Window Size 

e Checksumming 

e Compression 

e ECS Maximum Delay 

These fixes are described in the following sections. 


6.5.3.1 LAN VC Checksumming Fix 


The LAN VC Checksumming fix (Figure 6-24) allows you to turn checksumming 
on or off for the virtual circuit. 


Figure 6-24 LAN VC Checksumming 


Availability Manager Fixes 


| Node | Virtual Circuit — | 


Virtual Circuit: CMOVEQ PEAO: to BRICKS 


Fix Type Explanation 
Turn Checksumming on or off for the Virtual 
Checksumming ¥ Circuit 
This Virtual Circuit fix may not be available 
Fix Value on all target systems. 
ON 


Return Status 


| OK | Cancel | Apply | Help | 
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6.5.3.2 LAN VC Maximum Transmit Window Size Fix 


The LAN VC Transmit Window Size fix (Figure 6—25) allows you to adjust the 
maximum transmit window size for the virtual circuit. 


Figure 6—25 LAN VC Maximum Transmit Window Size 


Availability Manager Fixes ies 


“Node | Virtual Circuit 


Virtual Circuit: CMOVEQ PEAO: to BRICKS 


Fix Type Explanation 
Adjust the Maximurn Transmit Window Size 
_MaxXmtWin. for the Virtual Circuit 
This Virtual Circuit fix may not be available 
on all target systems. 


Fix Value 


Return Status 


| OK | Cancel | Apply | 
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6.5.3.3 LAN VC Maximum Receive Window Size Fix 


The LAN VC Maximum Receive Window Size fix (Figure 6-26) allows you to 
adjust the maximum receive window size for the virtual circuit. 


Figure 6-26 LAN VC Maximum Receive Window Size 


Availability Manager Fixes my xX 


“Node | Virtual Circuit 


Virtual Circuit: CMOVEQ PEAO: to BRICKS 


Fix Type Explanation 
Adjust the Maximum Receive Window Size 
_Max Rev Win S... for the Virtual Circuit 
This Virtual Circuit fix may not be available 
on all target systems. 


Fix Value 


Return Status 


| OK | Cancel | Apply | 
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6.5.3.4 LAN VC Compression Fix 


The LAN VC Compression fix (Figure 6-27) allows you to turn compression on or 
off for the virtual circuit. This fix, however, might not be available on all target 
systems. 


Figure 6-27 LAN VC Compression 


Availability Manager Fixes  x| 


“Node | Virtual Circuit 


Virtual Circuit: CMOVEQ PEAO: to BRICKS 


Fix Type Explanation 
Turn Compression on or off for the Virtual 
Compression ¥ Circuit 
This Virtual Circuit fix may not be available 
FixValue on all target systems. 
OFF | 


Return Status 


| OK | Cancel | Apply | 
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6.5.3.5 LAN VC ECS Maximum Delay Fix 


The LAN VC ECS Maximum Delay fix (Figure 6-28) sets a management-specific 
limit on the maximum delay (in microseconds) an ECS member channel can have. 
You can set a value between 0 and 3000000. Zero disables a prior management 
delay setting. 


You can use this fix to override PEdriver automatically calculated delay 
thresholds. This ensures that all channels with delays less than the value 
supplied are included in the VC’s ECS. 


Figure 6-28 LAN VC ECS Maximum Delay 


Availability Manager Fixes - xi 
(Node Virtual Circuit | 


BRICKS Virtual Circuit to CMOVEQ 


Fix Type Explanation 
Sets a management specified lower a 
| Demote Thres... ¥ bound on the maximum delay (in 
microseconds) an ECS member channel 


can have. Set a value between 0 and 
3000000. Zero disables a prior 
Q management delay setting. 


Fix Value 


You can use this command to override 
the PEdriver automatically calculated 
delay thresholds to ensure that all 
channels with delays less than the value 
supplied are included in the VC's ECS. 


3000000 


4 


Return Status 


| OK | Cancel | Apply | Help | 


On the sample page shown in Figure 6-28, you cannot read the following text 
(which is displayed when you move the slider down): “The fix operates as 
follows: Whenever at least none tight peer channel has a delay of less than 

the management-supplied value, all tight peer channels with delays less than 
the management-supplied value are automatically included in the ECS. When all 
tight peer channels have delays equal to or greater than the management setting, 
the ECS membership delay thresholds are automatically calculated and used. 


You must determine an appropriate value for your configuration by 
experimentation. An initial value of 2000 (2ms) to 5000 (5ms) is suggested.” 


On this page, the following note of caution is also displayed: 


Caution 


By overriding the automatic delay calculations, you can include a channel 
in the ECS whose average delay is consistently greater than 1.5 to 2 times 
the average delay of the fastest channels. When this occurs, the overall 
VC throughput becomes the speed of the slowest ECS member channel. 
An extreme example is when the management delay permits a 10Mb/sec 
Ethernet channel to be included with multiple 1Gb/sec channels. The 
resultant VC throughput drops to 10Mb/sec. 
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6.5.4 LAN Channel Fixes 


To access LAN path fixes, right-click an item on a LAN Path (Channel) Summary 
line (see Figure 4-6). The Data Analyzer displays a shortcut menu with the 
following options: 


e Channel Details... 
e LAN Device Details... 


e =6Fixes... 


Click Fixes... or use the Fix menu on the Channel Details page. The Data 
Analyzer displays a page with the following Fix Types: 


e Adjust Priority 
e Hops 
e Max Packet Size 


These fixes are described in the following sections. 
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6.5.4.1 LAN Path (Channel) Adjust Priority Fix 


The LAN Path (Channel) Adjust Priority fix (Figure 6-29) allows you to change 


the cost associated with this channel by adjusting its priority. This, in turn, 
affects the routing of cluster traffic. 


Figure 6—29 LAN/IP Path (Channel) Adjust Priority 


Availability Manager Fixes 


(; Node URUREE 


| LAN Channel MPESO QED) to MAPLE (ED) 


Fix Type Explanalion 


reeveAKisama tues Seat Sats he management priority value for 
| Adjust Prioeity | ihe channel, The priority can he a value 
between-128 and +127. 
Fix Value Suggested values are: 


2 ho cause channels to 66 preferred 
-2 to exclude channels 


CAUTION 
you set the priority ofall channels and | 
interfaces $0 -128, you will totally disable || 
use of the LAM for cluster 


Rehr Statue 


OK, | Cancel | Apply | Help 


Performing Fixes on OpenVMS Nodes 6-39 


Performing Fixes on OpenVMS Nodes 
6.5 Performing Cluster Interconnect Fixes 
6.5.4.2 LAN Path (Channel) Hops Fix 


LAN Path (Channel) Hops fix (Figure 6-30) allows you to change the hops for the 
channel. This change, in turn, affects the routing of cluster traffic. 


Figure 6-30 LAN/IP Path (Channel) Hops 


Availability Manaper Fiwes 


LAN Channel MPESO (VED) to wAPLE (IED) 


Fiz Type Explanation 


Agius! the hows for he Channel 


This fie changas tha hops value associated 
with this Channel which in tum affects te 
rowing of cluster iraimic, 


Fix Value 


Return Stats 


poseeauicain esas er Lee 
| ok | Cancel Apply ] Hailp 
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6.5.5 LAN Device Fixes 


To access LAN device fixes, right-click an item in the LAN Path (Channel) 
Summary category (see Figure 4-6). The Data Analyzer displays a shortcut menu 
with the following options: 


e Channel Details... 
e LAN Device Details... 
e =6Fixes... 


Select LAN Device Details to display the LAN Device Details window. From 
the Device Details window, select Fix... from the Fix menu. (These fixes are also 
accessible from the LAN Device Summary page.) 


The Data Analyzer displays the first of several pages, each of which contains a fix 
option: 

Adjust Priority 

Set Max Buffer Size 

Start LAN Device 

Stop LAN Device 


These fixes are described in the following sections. 
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6.5.5.1 LAN Device Adjust Priority Fix 
The LAN Device Adjust Priority fix (Figure 6-31) allows you to adjust the 
management priority for the device. This fix changes the cost associated with 
this device, which, in turn, affects the routing of cluster traffic. 


Starting with OpenVMS Version 7.3-2, a channel whose priority is -128 is 
not used for cluster communications. The priority of a channel is the sum of 
the management priority assigned to the local LAN device and the channel 
itself. Therefore, you can assign any combination of channel and LAN device 
management priority values to arrive at a total of -128. 


Figure 6-31 LAN/IP Device Adjust Priority 


Availability Manager Fines 


Node |] IF interface | 


| IP Interface Details: MPESO (WED) 


Fiz Type Explanation 


eatin ERD ERI Sets He managenvent priority value for 
Adjust Priority |} ihe IF interface. The priarib tan be a value 
5 between-128 and +127. 
Suggested values are: 
2 to caves inferaces fo be preferred 
-2 to exclude interfaces 


CAUTION 
you selina priority of all devices and || 
interfaces fo -128, you will totally disable || 
use of iP for cluster communication, | 


Fiz Value 


Raturn Staius 


fe ee aie nny eee se eseastes 
) Uk | Cancel | Apply | Help 


| 
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6.5.5.2 LAN Device Set Maximum Buffer Fix 
The LAN Device Set Maximum Buffer fix (Figure 6-32) allows you to set the 
maximum packet size for the device, which changes the maximum packet size 
associated with this channel. This change, in turn, affects the routing of cluster 


traffic. 


Figure 6-32 LAN Device Set Maximum Buffer Size 


Availability Manaper Fixes 


IF Interface Details: MFESO (WED) 


Fix Type Explanation 
Sat fet masini packet size for lhe IF 
Interface 
This fit changes he macimurn packet size 
associaied with this Channel which in turn 
affects he routing of cluster traffic. 


Return Status 


| Lk, | Lancel | Appiy | Help 
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6.5.5.3 LAN Device Start Fix 
The LAN Device Start fix (Figure 6-33) starts the use of this particular LAN 
device. This fix allows you, at the same time, to enable this device for cluster 
traffic. 


Figure 6-33 LAN/IP Device Start 


Availability Manager Fixes 


IF Interface Deiails: MFESO (VED) 


Fix Type Explanation 


os | Starts use ofthis IP Interface 
| Star lP interface This ft enables the use of this IF Interface 
for cluster iramic. 


Rehirn Status 


| 
| 
‘aa usta ciasrerccasrrmaauisciminanisiscecriraassill 


OK | Cancel | Apply | Help 
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The LAN Device Stop fix (Figure 6-34) stops the use of this particular LAN 
device. At the same time, this fix disables this device for cluster traffic. 


Caution 


This fix could result in interruption of cluster communications for this 
node. The node might exit the cluster (CLUEXIT crash). 


Figure 6-34 LAN/IP Device Stop 
Availability Manaper Fixes 
(Node [Pines] 
| IP Interface Details: MPESO (WED) 


Fiz Type Explanation 


Shop use of his PF Inbertace 


This fe disables the use of this IF Interface 
for cluster trafic. 


Stop IP Interface 


CAUTION 
Ths fie could resultio interruption of cluster 
communications for this rode to tre meted 
fhat it emit the cluster (ie. CLUEXIT crash 


LK. | Lancel | Apply | He 
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Customizing the Availability Manager Data 


Analyzer 


This chapter explains how to customize the following Availability Manager Data 


Analyzer features: 


Feature 


Description 


Nodes or node 
groups 
Data collection 


Data filters 


Event escalation 


Event filters 


Security 


Watch process 


You can select one or more groups or individual nodes to monitor. 


For OpenVMS nodes, you can choose the types of data you want 
to collect as well as set several types of collection intervals. (On 
Windows nodes, specific types of data are collected by default.) 


For OpenVMS nodes, you can specify a number of parameters and 
values that limit the amount of data that is collected. 


You can customize the way events are displayed in the Event 
pane of the System Overview window (Figure 2-25), and you can 
configure events to be signaled to OPCOM and OpenView. 


You can specify the severity of events that are displayed as well as 
several other filter settings for events. 


On Data Analyzer and Data Collector nodes, you can change 
passwords. On OpenVMS Data Collector nodes, you can edit a 
file that contains security triplets. 


You can specify up to eight processes for the Data Analyzer to 
monitor and report on if they exit and also if they subsequently are 
created. 


In addition, you can change the group membership of nodes, as explained in 
Section 7.4.1 and Section 7.4.2. 


Table 7-1 shows the levels of customization the Data Analyzer provides. At each 
level, you can customize specific features. The table shows the features that can 
be customized at each level. 
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Table 7-1 Levels of Customization 


Operating 

Customizable Features Application System Group Node 
Nodes or node groups x 

Data collection x Xx Xx 
Data filters Xx Xx Xx 
Event escalation x Xx Xx X 
Event filters Xx Xx X 
Security Xx Xx xX 
Watch process x x x 


7.1 Understanding Levels of Customization 


You can customize each feature at one or more of the following levels, as shown 
in Table 7-1: 


e Application 

e Operating System 
e Group 

e Node 


In addition to the four levels of customization are Availability Manager Data 
Analyzer Defaults (AM Defaults), which are top-level, built-in values that are 
preset (hardcoded) within the Availability Manager Data Analyzer. Users cannot 
change these settings themselves. If no customizations are made at any of the 
four levels, the AM Default values are used. 


The following list describes the four levels of customization. 


e Application values override AM Defaults for nodes and groups of nodes as 
well as event escalation (unless overriding customization are made at the 
operating system, group, or node levels). 


e Operating system values override Application values for event escalation. 
Operating System values override AM Defaults for the remaining features 
shown in Table 7-1. 


e Group values override Operating System and Application values as well as 
AM Defaults. 


e Node values override Group, Operating System, and Application values, as 
well as AM Defaults. 


Any of these four levels of customization overrides AM Defaults. Also, 
customizing values at any successive level overrides the value set at the previous 
level. For example, customizing values for Data filters at the Group level 
overrides values for Data filters set at the Operating System level. Similarly, 
customizing values for Data filters at the Node level overrides values for Data 
filters set at the Group level. 
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The customization levels for various Data Analyzer values are displayed as icons 
on some pages. The OpenVMS Data Collection Customization page (Figure 7-1) 


displays several o 


f these icons. 


Figure 7-1 OpenVMS Data Collection Customization 


Customization - Settings for Open¥MS node AFFST xi 


ee Collection Fite Beeuriy | 


Data Collection & Update Intervals Use 


$5 CPU mode 


$2 Cluster summary 


6 Disk status 


4 CPU process — 


‘iil Disk volume 


i vo 


~ lil Lock contention 


_& Memory 


Node summary 


‘fil Page/swap file 


‘i Single disk 


vi 
Fd 
vi 
vi 
vi 
wy 
wy 


(i Single process 


built-in set. 


¢ Indicates the current settings are from the Application level 


=| Node AFFS7 


Icons are used to indicate the current customization level in effect. 
Indicates the current settings are from the Availability Manager 


| ok | cancel |» 


ply | Help / 


The icons preceding each data item in Figure 7—1 indicate the current 
customization level for each collection choice. Table 7—2 describes these icons 
and tells where each appears in Figure 7-1. 


Table 7-2 Customization Icons in Figure 7-1 


Icon Location Meaning 

Graph Before “Disk volume” Current setting is from the built-in AM 
Defaults. 

Magnifying Bottom left of window Current setting is from the Application 

glass level. 

Swoosh Before “Disk status” Current setting has been modified at the 


Double monitors 


Single monitor 


Before “Cluster 
summary” 


Before “Memory” 


OpenVMS Operating System Level. 


Current setting has been modified at the 
group level. 


Current setting has been modified at the 
node level. 
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7.1.2 Setting Levels of Customization 


When you customize values, the Data Analyzer keeps track of the next higher 
level of each value. This means that you can reset a value to the value set at the 
next higher level. 


To return to the values set at the preceding level, click the Use default 
values button at the top of a customization page. The icon on the “Use default 
values” button and explanation at the bottom of the page indicate the previous 
customization level. 


In the main System Overview window (see Figure 2—25), you can select the 
customization levels that are shown in Table 7-1. The following sections explain 
levels of customization in more detail. 


7.1.3. Knowing the Number of Nodes Affected by Each Customization Level 


Another way of looking at Data Analyzer customization is to consider the 
number of nodes affected by each level of customization. Depending on which 
customization menu you use and your choice of menu items, your customizations 
can affect one or more nodes, as indicated in the following table. 


Nodes Affected Action 

All nodes Select Customize Application... on the menu shown in 
Figure 7-2. 

All Windows nodes Select Operating Systems -> Customize Windows NT... on 


the menu shown in Figure 7-2. 


All OpenVMS nodes Select Operating Systems -> Customize OpenVMS... on 
the menu shown in Figure 7-2. 


Nodes in a group Select Customize... on the shortcut menu shown in 
Figure 7-7. The customization options you choose affect only 
the group of nodes that you select. 


One node Select Customize... on the shortcut menu shown in 
Figure 7-8 or on the Customize shortcut menu on the Node 
page. The customization options you choose affect only the 
node that you select. 


7.2 Customizing Settings at the Application and Operating System 
Levels 


In the System Overview window menu bar, select Customize. The Data Analyzer 
displays the shortcut menu shown in Figure 7-2. 


Figure 7-2 Application and Operating System Customization Menu 


Sa 
H Customize Application... MEM 
© Customize OpenvMS... 
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7.2.1 Customizing Application Settings 


When you select Customize Application..., by default the Data Analyzer 
displays the Group/Nodes Lists page (Figure 7-3), where the Inclusion lists tab 
is the default. 


Note 


The Event Escalation tab displayed on the Application Settings page 
(Figure 7-3) is explained in Section 7.7. 


7.2.1.1 Application Settings—Groups/Nodes Inclusion Page 


On the Groups/Nodes Inclusion page (Figure 7-3) you can select groups of nodes 
or individual nodes to be displayed. 


Figure 7-3 Application Settings—Groups/Nodes Inclusion 


Customization - Application Settings 


(_GroupiNode Lists | EventEscalation | 


© Inclusion lists | [iil] Exelisiot 


Groups/Nodes to display Use default values 


\¥) Group List {_J Node List Explanation 
DECAMDS When the "Group 
Debug cluster List' checkbox is 
KOINE checked for groups 
KOINE2 or"Node List" for 
INT Nodes nodes, only the 


groups or nodes in 
the checked lists 
are monitored. If 
both checkboxes 
are unchecked, 
then all groups and 
nodes will be 
monitored. 


Application Settings | ok | cancel | Apply | on | 
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On the Groups/Nodes Inclusion page, you have the following choices: 


Group List 


Select the Group List check box. Then enter the names of the groups of 
nodes you want to monitor. (The names are case-sensitive, so be sure to enter 
the correct case.) 


For instructions for changing the group membership of a node, see 
Section 7.4.1 and Section 7.4.2 


Node List 


Select the Node List check box. Then enter the names of individual nodes 
you want to monitor. (The names are case-sensitive, so be sure to enter the 
correct case.) 


Both Group List and Node List 


If you select both check boxes, you can enter the names of groups of nodes as 
well as individual nodes you want to monitor. (If you enter the name of an 
individual node, the Data Analyzer displays the name of the group that the 
node is in, but no additional nodes in that group.) 


Neither list 


The Group List and Node List are not used; all groups and all nodes are 
monitored. 


If you decide to return to the default (Group List: DECAMDS) or to enter names 
again, select Use default values. 


After you enter a list of nodes or groups of nodes, click one of the following 
buttons at the bottom of the page: 


Option Description 

OK Accepts the choice of names you have entered and exits the page. 

Cancel Cancels the choice of names and does not exit the page. 

Apply Accepts the choice of names you have entered but does not exit the 
page. 


If nodes were previously selected for monitoring, their names are not removed 
from the display even if you click OK or Apply. They are filtered out the next 
time the Data Analyzer is started. 


7.2.1.2 Application Settings—Groups/Nodes Exclusion Lists 
As an alternative to the Inclusion lists on the Groups/Nodes Inclusion page, you 
can click the Exclusion lists tab in Figure 7—4, where you can select groups of 
nodes or individual nodes to be excluded from display. 
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Figure 7-4 Application Settings—Groups/Nodes Exclusion Lists 


Customization - Application Settings 


(_GroupiNode Lists | EventEstalation | 
Exclusion lists | 


Groups/Nodes to exclude _ ill Use defauitvalues 


{_J Group List {_j Node List Explanation 


When the "Group 
List" checkbox is 
checked for groups 
or"Node List" for 
nodes, the groups 
ornodes in the 
checked lists are 
not monitored. If 
both checkboxes 
are unchecked, 
then no groups or 
nodes will be 
excluded. 


Application Settings | ok | cancel | appy | en 


On the Groups/Nodes Exclusion Lists page, you have the following choices: 
e Group List 


Select the Group List check box. Then enter the names of the groups of 
nodes you want to exclude from monitoring. (The names are case-sensitive, so 
be sure to enter the correct case.) 


For instructions on changing the group membership of a node, see 
Section 7.4.1 and Section 7.4.2. 


e Node List 


Select the Node List check box. Then enter the names of individual nodes 
you want to exclude from monitoring. (The names are case-sensitive, so be 
sure to enter the correct case.) 


e Both Group List and Node List 


If you select both check boxes, you can enter the names of groups of nodes as 
well as individual nodes you want to exclude from monitoring. (If you enter 

the name of an individual node, the Data Analyzer displays the name of the 

group that the node is in, but no additional nodes in that group.) 


e Neither box 


The Group List and Node List are not used; all groups and all nodes are 
monitored. 


After you enter a list of nodes or groups of nodes, click one of the buttons at the 
bottom of the page: 
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Option Description 

OK Accepts the choice of names you have entered and exits the page. 

Cancel Cancels the choice of names and does not exit the page. 

Apply Accepts the choice of names you have entered but does not exit the 
page. 


If nodes were previously selected for monitoring, their names are not removed 
from the display even if you click OK or Apply to exclude them from monitoring. 


7.2.2 Customizing Windows Operating System Settings 


When you select Customize Windows NT..., the Data Analyzer displays a page 
similar to the one shown in Figure 7-5. 


Figure 7-5 Windows Operating System Customization 
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Event explanation and investigation hints 


The Availability Manager has made a connection to the data collection 
node and will start collecting data according to the customize data 
collection options selected. 


This is an informational event to indicate that the node has been 
recognized. No further investigation is required. 


7] Global Windows NT ok | cancel || apply | Help | 


The default page displayed is the Event Customization page. Instructions for 
using this page are in Section 7.8.1. The other tabs displayed are the Event 
Escalation page, which is explained in Section 7.7, and the Windows Security 
Customization page, which is explained in Section 7.9.2.2. 


7.2.3 Customizing OpenVMS Operating System Settings 


When you select Customize OpenVMGS..., the Data Analyzer displays the pages 
shown in Figure 7-6, which contains tabs for the last six types of customization 
listed in Table 7-1. (Instructions for making these types of customizations are 
later in this chapter, beginning in Section 7.5. 
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Figure 7-6 OpenVMS Operating System Customization 
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Icons are used to indicate the current customization level in effect. 
Indicates the current settings are from the Availability Manager 
built-in set. 


© Global OpenVMS 


© Indicates the current settings are from the Application level 
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Help 


7.3 Customizing Settings at the Group Level 


To perform customizations at the group level, right-click a group name in the 
System Overview window. The Data Analyzer displays a small menu similar to 


the one shown in Figure 7-7. 


Figure 7-7 Group Customization Menu 
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When you select Customize, the Data Analyzer displays a page similar to the 


one shown in Figure 7-6. 
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7.4 Customizing Settings at the Node Level 
To customize a specific node, do either of the following: 
e Select the Customize option at the top of the Group/Node page. 


e Right-click a node name in the Node pane of the System Overview window 
(see Figure 2—25). 


The Data Analyzer displays the shortcut menu shown in Figure 7-8. 


Note 


You can customize nodes in any state. 


Figure 7-8 Node Customization Menu 
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When you select Customize, the Data Analyzer displays a customization page 
similar to the one shown in Figure 7-6. 


7.4.1 Changing the Group of an OpenVMS Node 


Each Availability Manager Data Collector node is assigned to the DECAMDS 
group by default. 


Note 


You need to place nodes that are in the same cluster in the same group. 
If such nodes are placed in different groups, some of the data collected 
might be misleading. 


You need to edit a logical on each Data Collector node to change the group for 
that node. To do this, follow these steps: 


1. Assign a unique name of up to 15 alphanumeric characters 
to the AMDS$GROUP_NAME logical name in the 
AMDS$AM_SYSTEM:AMDS$LOGICALS.COM file. For example: 


$ AMDSSDEF AMDSSGROUP_NAME FINANCE ! Group FINANCE; OpenVMS Cluster alias 
2. Apply the logical name by restarting the Data Collector: 


S$ @SYSSSTARTUP:AMDSSSTARTUP RESTART 
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7.4.2 Changing the Group of a Windows Node 


Note 


These instructions apply to versions prior to Version 2.0-1. 


You need to edit the Registry to change the group of a Windows node. To edit the 
Registry, follow these steps: 


1. 


10. 
1. 


12. 


OM NAH RK w 


Click the Windows Start button. On the menu displayed, first select 
Programs, then Accessories, and then Command Prompt. 


Type REGEDIT after the angle prompt (>). 


The system displays a screen for the Registry Editor, with a list of entries 
under My Computer. 


On the list displayed, expand th HKEY_LOCAL_MACHINE entry. 
Double-click SYSTEM. 

Click CurrentControlSet. 

Click Services. 

Click damdrvyr. 

Click Parameters. 


Double-click Group Name. Then type a new group name of 15 alphanumeric 
characters or fewer, and click OK to make the change. 


On the Control Panel, select Services, and then select Stop for “PerfServ.” 


Again on the Control Panel, select Devices, and then select Stop for 
“damdrvyr.” 


First restart damdrvr under “Devices,” and then restart PerfServ under 
“Services.” 


This step completes the change of groups for this node. 


7.5 Customizing OpenVMS Data Collection 


Note 


Before you start this section, be sure to read the explanation of data 
collection, events, thresholds, and occurrences in Chapter 1. Also, be sure 
you understand background and foreground data collection. 


When you choose the Customize OpenVMS menu option in the System 
Overview window (see Figure 7—2), by default the Data Analyzer displays the 
OpenVMS Data Collection Customization page (Figure 7-9) where you can select 
types of data you want to collect for all of the OpenVMS nodes you are currently 
monitoring. You can also change the default Data Analyzer intervals at which 
data is collected or updated. 
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Figure 7-9 OpenVMS Data Collection Customization 
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Icons are used to indicate the current customization level in effect. 


Indicates the current settings are from the Availability Manager 
built-in set. 


¢ Indicates the current settings are from the Application level 
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Table 7—3 identifies the page on which each type of data collected and displayed 
in Figure 7-9 appears and indicates whether or not background data collection 
is turned on for that type of data collection. See Chapter 1 for information about 
background data collection. (You can also customize data collection at the group 
and node levels, as explained in Section 7.1.) 


Note 


When you select a type of data collection, an icon appears on the 

“Use default values” button indicating the previous (higher) level of 
customization where customizations might have been made. Pressing the 
“Use default values” button followed by the “Apply” button causes any 
customizations made at the current level to be discarded and the values 
from the previous collection to be used. 


You can select more than one collection choice using the Shift and/or Ctrl 
keys. In this case, none of the icons appear on the “Use default values” 
button. Pressing the “Use default values” button causes each selected 
collection choice to be reset to the value at its own previous level of 
customization. 
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Table 7-3 Data Collection Choices 


Background 
Data 
Collection 
Data Collected Default Page Where Data Is Displayed 
Cluster summary No Cluster Summary page 
CPU mode No CPU Modes Summary page 
CPU summary No CPU Process States page 
Disk status No Disk Status Summary page 
Disk volume No Disk Volume Summary page 
V/O data No 1/O Summary page 
Lock contention No Lock Contention page 
Memory No Memory Summary page 
Node summary Yes Node pane, Node Summary page, and the top 
pane of the CPU, Memory, and I/O pages 
Page/Swap file No I/O Page Faults page 
Single disk Yes? Single Disk Summary page 
Single process Yes” Data collection for the Process Information page 


1Data is collected by default when you open a Single Disk Summary page. 
2Data is collected by default when you open a Single Process page. 


You can choose additional types of background data collection by selecting the 
Collect check box for each one on the Data Collection Customization page of the 
Customize OpenVMS... menu (Figure 7-6). A check mark indicates that data 
is to be collected at the intervals described in Table 7-4. 


Note 


For accurate evaluation of events that require cluster-wide data collection 
(lock contention, disk status and volume), it is recommended that cluster- 
wide data collections be collected with background data collection at the 
OpenVMS Group level. This is described in Section 7.3. 


Table 7-4 Data Collection Intervals 


Interval Name Description 

Display How often the data is collected when its corresponding display is 
active. 

Event How often the data is collected when its corresponding display is 


not active and when events are active. 


NoEvent How often the data is collected when its corresponding display is 
not active and when events are not active. 


You can enter a different collection interval by selecting a row of data and 
selecting a value. Then delete the old value and enter a new one. 
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If you change your mind and decide to return to the default collection interval, 
select one or more rows of data items: then select Use default values. The 
system displays the default values for all the collection intervals. 


When you finish customizing your data collection, click one of the following 
buttons at the bottom of the page: 


Option Description 

OK To confirm any changes you have made and exit the page. 

Cancel To cancel any changes you have made and exit the page. 

Apply To confirm and apply any changes you have made and not exit the page. 


7.6 Customizing OpenVMS Data Filters 


When you choose “Customize” at the operating system, group, or node level and 
then select the Filter tab, the Data Analyzer displays pages that allow you to 
customize data (see Figure 7-10). The types of data filters available are the 
following: 


e CPU 

e Disk Status 

e Disk Volume 

e V/O 

e Lock Contention 
e Memory 

e Page/Swap File 


Filters can vary depending on the type of data collected. For example, filters 
might be process states or a variety of rates and counts. The following sections 
describe data filters that are available for various types of data collection. 


You can also customize filters at the group and node levels (see Section 7.1). 


Keep in mind that the customizations that you make at the various levels 
override the ones set at the previous level (see Table 7-1). The icons preceding 
each data item (see Table 7—2) indicate the level at which the data item was 
customized. In Figure 7-10, for example, the icon preceding “CPU” indicates that 
the current setting comes from the AM Defaults. 


If you change your mind and decide to return to filter values set at the previous 
level, select Use default values. The icon appearing on the button indicates the 
level of the previous values. In Figure 7-10, for example, the previous value is 
the AM Defaults value. 


When you finish modifying filters on a page, click one of the following buttons at 
the bottom of the page: 


Option Description 
OK To confirm any changes you have made and exit the page. 
Cancel To cancel any changes you have made and exit the page. 
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Option Description 
Apply To confirm and apply any changes you have made and continue to display the 
page. 


7.6.1 OpenVMS CPU Filters 


When you select “CPU” on the Filter tabs, the Data Analyzer displays the 
OpenVMS CPU Filters page (Figure 7-10). 


Figure 7-10 OpenVMS CPU Filters 
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The OpenVMS CPU Filters page allows you to change and select values that are 
displayed on the OpenVMS CPU Process States page (Figure 3-8). 


You can change the current priority and rate of a process. By default, a process 
is displayed only if it has a Current Priority of 4 or more. Click the up or down 
arrow to increase or decrease the priority value by one. The default CPU rate is 
0.0, which means that processes with any CPU rate used will be displayed. To 
limit the number of processes displayed, you can click the up or down arrow to 
increase or decrease the CPU rate by .5 each time you click. 


The OpenVMS CPU Filters page also allows you to select the states of the 
processes that you want to display on the CPU Process States page. Select the 


check box for each state you want to display. (Process states are described in 
Appendix A.) 
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7.6.2 OpenVMS Disk Status Filters 


When you select Disk Status on the Filter tabs, the Data Analyzer displays the 
OpenVMS Disk Status Filters page (Figure 7-11). 


Figure 7-11 OpenVMS Disk Status Filters 
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The OpenVMS Disk Status Summary page (Figure 3-14) displays the values you 


set on this page. 


This page lets you change the following default values: 


Data 


Description 


Error Count 


Transaction 
Mount Count 
RWAIT Count 


The number of errors generated by the disk (a quick indicator of 
device problems). 


The number of in-progress file system operations for the disk. 
The number of nodes that have the specified disk mounted. 


An indicator that a system I/O operation is stalled, usually during 
normal connection failure recovery or volume processing of host- 
based shadowing. 


This page also lets you check the states of the disks you want to display, as 
described in the following table: 


Disk State Description 

Invalid Disk is in an invalid state (Mount Verify Timeout is likely). 
Shadow Member Disk is a member of a shadow set. 

Unavailable Disk is set to unavailable. 
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Disk State Description 

Wrong Vol Disk was mounted with the wrong volume name. 

Mounted Disk is logically mounted by a MOUNT command or a service call. 
Mount Verify Disk is waiting for a mount verification. 

Offline Disk is no longer physically mounted in device drive. 

Online Disk is physically mounted in device drive. 


7.6.3 OpenVMS Disk Volume Filters 


When you select Disk Volume on the Filter tabs, the Data Analyzer displays 
the OpenVMS Disk Volume Filters page (Figure 7-12). 


Figure 7-12 OpenVMS Disk Volume Filters 


Customization - OpenVMS Default Settings 


x! 


Disk¥olume |i 


Disk Volume filters 


Used Blocks 


| fii] Use default values | 


Exclude Devices 


Disk % Used 


Free Blocks 


(_] Use device filter 


Queue Length 


0.0 


Operations Rate 
RAMdisks 


Sec. Page/Swap 


Wrtlacked Yolumes 


>] [a>] [a >| fa | [a >| 


0.0 


Show devices 


Show devices 


Show devices 


6 Global OpenVMS 


|_ 0K | Cancel | Apply 


The OpenVMS Disk Volume Filters page allows you to change the values for the 


following data: 
Data Description 
Used Blocks The number of volume blocks in use. 


Disk % Used 


Free Blocks 
Queue Length 


Operations Rate 


The percentage of the number of volume blocks in use in relation to 
the total volume blocks available. 


The number of blocks of volume space available for new data. 
Current length of I/O queue for a volume. 


The rate at which the operations count to the volume has changed 
since the last sampling. The rate measures the amount of activity 
on a volume. The optimal load is device specific. 


You can also change options for the following to be on (checked) or off (unchecked): 
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e RAMdisks: Show devices 
e Sec. Page/Swap: Show devices 


Secondary Page or Swap devices are disk volumes that have “PAGE” or 
“SWAP” in the volume name. This filter is useful for filtering out disks that 
are used only as page or swap devices. 


e Wrtlocked Volumes: Show devices (for example, CDROM devices) 
e Exclude Devices: Use device filter 


You can exclude specific disk volumes by listing them in the Exclude Devices 
text box. You can use wildcards to specify the disk volumes. Four examples 
are shown in Figure 7-12. 


7.6.4 OpenVMS I/O Filters 


When you select I/O on the Filter tabs, the Data Analyzer displays the OpenVMS 
I/O Filters page (Figure 7-13). 


Figure 7-13 OpenVMS I/O Filters 
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The OpenVMS I/O Summary page (Figure 3-12) displays the values you set on 
this filters page. 


This filters page allows you to change values for the following data: 


Data Description 


Direct I/O Rate The rate of direct I/O transfers. Direct I/O is the average percentage 
of time that the process waits for data to be read from or written to 
a disk or tape. The possible state is DIO. Direct I/O is usually disk 
or tape I/O. 
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Data 


Description 


Buffered I/O Rate 


Paging I/O Rate 


Open File Count 
BIO lim Remaining 


DIO lim Remaining 


BYTLM Remaining 


Open File limit 


The rate of buffered I/O transfers. Buffered I/O is the average 
percentage of time that the process waits for data to be read from or 
written to a slower device such as a terminal, line printer, mailbox. 
The possible state is BIO. Buffered I/O is usually terminal, printer 
T/O, or network traffic. 


The rate of read attempts necessary to satisfy page faults (also 
known as Page Read J/O or the Hard Fault Rate). 


The number of open files. 


The number of remaining buffered I/O operations available before 
the process reaches its quota. BIOLM quota is the maximum 
number of buffered I/O operations a process can have outstanding 
at one time. 


The number of remaining direct I/O limit operations available 
before the process reaches its quota. DIOLM quota is the maximum 
number of direct I/O operations a process can have outstanding at 
one time. 


The number of buffered I/O bytes available before the process 
reaches its quota. BYTLM is the maximum number of bytes of 
nonpaged system dynamic memory that a process can claim at one 
time. 


The number of additional files the process can open before reaching 
its quota. FILLM quota is the maximum number of files that can 
be opened simultaneously by the process, including active network 
logical links. 


7.6.5 OpenVMS Lock Contention Filters 


The OpenVMS Lock Contention Filters page allows you to remove (filter out) 
resource names from the Lock Contention page (Figure 3-19). 


When you select Lock Contention on the Filter tabs, the Data Analyzer 
displays the OpenVMS Lock Contention Filters page (Figure 7-14). 
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Figure 7-14 OpenVMS Lock Contention Filters 
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Each entry on the Lock Contention Filters page is a resource name or part of a 
resource name that you want to filter out. For example, the STRIPE$ entry filters 
out any value that starts with the characters STRIPE$. In the example of | ** in 
Figure 7-14, the two asterisks are literal asterisks, not wildcard characters. 


For resources that contain byte values that are not printable, the Hex Edit pane 
at the bottom of the Lock Contention Filters page allows you to enter these byte 
values in hexadecimal. 


To redisplay values set previously, select Use default values. 


7.6.6 OpenVMS Memory Filters 


When you select Memory Filters on the Filter tabs, the Data Analyzer 
displays an OpenVMS Memory Filters page that is similar to the one shown 
in (Figure 7-15). 
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Figure 7-15 OpenVMS Memory Filters 
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The OpenVMS Memory page (Figure 3-10) displays the values on this filter page. 


The OpenVMS Memory Filters page allows you to change values for the following 
data: 


Data Description 


Working Set Count The number of physical pages or pagelets of memory that the 
process is using. 


Working Set Size The number of pages or pagelets of memory the process is allowed 
to use. The operating system periodically adjusts this value based 
on an analysis of page faults relative to CPU time used. An increase 
in this value in large units indicates a process is receiving a lot of 
page faults and its memory allocation is increasing. 


Working Set Extent The number of pages or pagelets of memory in the process’s 
WSEXTENT quota as defined in the user authorization file (UAF). 
The number of pages or pagelets will not exceed the value of the 
system parameter WSMAX. 


Page Fault Rate The number of page faults per second for the process. 


Page I/O Rate The rate of read attempts necessary to satisfy page faults (also 
known as page read J/O or the hard fault rate). 


7.6.7 OpenVMS Page/Swap File Filters 


When you select Page/Swap File on the Filter tabs, the Data Analyzer displays 
the OpenVMS Page/Swap File Filters page (Figure 7-16). 
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Figure 7-16 OpenVMS Page/Swap File Filters 
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The OpenVMS I/O Summary page (Figure 3-12) displays the values that you set 
on this filter page. 


This filter page allows you to change values for the following data: 


Data Description 


Used Blocks The number of used blocks within the file. 

Page File % Used The percentage of the blocks from the page file that have been used. 
Swap File % Used The percentage of the blocks from the swap file that have been used. 
Total Blocks The total number of blocks in paging and swapping files. 


Reservable Blocks Number of reservable blocks in each paging and swapping file 
currently installed. Reservable blocks can be logically claimed by a 
process for a future physical allocation. A negative value indicates 
that the file might be overcommitted. Note that a negative value is 
not an immediate concern but indicates that the file might become 
overcommitted if physical memory becomes scarce. 


Note: Reservable blocks are not used in more recent versions of 
OpenVMS. 


You can also select (turn on) or clear (turn off) the following options: 
e Show page files 


e Show swap files 
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7.7 Customizing Event Escalation 


You can customize the way events are displayed in the Event pane of the System 
Overview window (Figure 2—25) and configure events to be signaled to OPCOM or 
HP OpenView. You do this by setting the criteria that determine whether events 

are signaled on the Event Escalation Customization page (Figure 7-17). 


Note 


Event escalation is the one set of Data Analyzer parameters that you can 
adjust at all four configuration levels (Application, Operating System, 
Group, and Node). 


When you select any of the customization options, the Data Analyzer displays a 
tabbed page similar to the one shown in Figure 7-17. 


Figure 7-17 Event Escalation Customization 


Customization - Open¥MS Default Settings Wixi 


nts Event Escalation i 
Event escalation parameters | Use defaultvalues | 
Event Window 


HP OpenView 
(_] Escalate events using HP OpenView 


Escalate events over severity threshold (0-100) 90 
Timeout triggering escalation of events (secs) 600 


|e olla >| 


© Global OpenVMS [ ok |] cancer || Apaiy | Het 


The Event Escalation Customization page contains the following sections: 

e Event Window 
With the exception of “Informational event timeout (secs)”, the items in this 
section are dimmed because they have not yet been implemented. However, 
you can set the number of seconds that an informational event is displayed in 


the Event pane of the System Overview window (Figure 2-25). (The default 
is 30 seconds.) 


e OPCOM 


The items in this section are dimmed if you are not using an OpenVMS 
system. 
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If you are using an OpenVMS system, you can check the box in the OPCOM 
section of the page and then enter two values that work together to determine 
whether an event is sent to OPCOM: 
— Escalate events over severity threshold (0-100) 
The severity level over which an event might be sent to OPCOM if the 
second criterion is met. 
— Timeout triggering escalation of events (secs) 


The length of time, in seconds, that an event (over a severity threshold 
that you have entered) is displayed in the Event pane of the System 
Overview window (Figure 2—25) before the event is sent to OPCOM. 


e HP OpenView 


Values that you enter have no effect if you do not have HP OpenView agents 
installed and configured on your system. (For configuration instructions, see 
the next section.) 


If HP OpenView agents are installed and configured on your system, you can 
check the box in the OpenView section of the page and then enter two values 
that work together to determine whether an event is sent to OpenView: 


— Escalate events over severity threshold (0-100) 


The severity level over which an event might be sent to OpenView if the 
second criterion is met. 


— Timeout triggering escalation of events (secs) 


The length of time, in seconds, that an event (over a certain severity 
threshold) is displayed in the Event pane of the System Overview window 
(see Figure 2-25) before the event is sent to OpenView. 


The following table compares Availability Manager and OpenView severity 


levels: 

Availability Manager OpenView 
0-19 Normal 
20 - 39 Warning 
40 - 59 Minor 

60 - 79 Major 

80 - 100 Critical 


Important 


For an event to be escalated using OPCOM or HP OpenView, the following 
conditions must be met: 


e On the Event Customizations page (Figure 7-18), the OPCOM or HP 
OpenView box must be checked. 


e On the Event Escalation page (Figure 7-17), the box in the OPCOM 
or HP OpenView section of the page must be checked. 


e On the Event Escalation page (Figure 7-17), the severity of an 
event must meet or exceed the corresponding severity threshold 
for the event, which is shown on the Event Customizations page 
(Figure 7-18). 
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The event must be displayed in the Event pane of the System 
Overview window (Figure 2—25) for the required length of time 


before the event is sent to OPCOM or OpenView. (The default is 10 
minutes.) 


Figure 7-18 Event Customizations 


CFGDON, configuration done ¥ | Use default values 
Event Customizations 
Severity 0 [eliGccuvence | 1 ‘(e 


Escalation actions: [| User [1] Opcom [J HP OpenView 


User Actior | Vindows™ procedure 
Event explanation and investigation hints 


The Availability Manager has made a connection to the data collection 
node and will start collecting data according to the customize data 
collection options selected. 


This is an informational event to indicate that the node has been 
recognized. No further investigation is required. 


& Global OpenVMS [ok | cancel | apply | Hein | 


7.7.1 Configuring HP OpenView on Your Windows or HP-UX System 


Note 


The instructions in this section are for configuring HP OpenView 
on Windows. (The configuration for HP-UX systems is very similar; 
instructions, however, are not included in this section.) 


Installing the HP OpenView Server 
Prior to configuring HP OpenView, you must perform two steps: 


1. 


Install the HP OpenView server software on a Windows or an HP-UX system. 
(The Data Analyzer can forward events to either a Windows or an HP-UX 


system.) For information about performing these installations, see the HP 
OpenView documentation. 


Install the HP OpenView template for the Data Analyzer on the HP 
OpenView server. This is described in the Guide for Setting Up the 


Availability Manager to Forward Events to OpenView on the Documentation 
page on the Availability Manager Web site: 
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http: //h71000.www7.hp.com/openvms/products/availman/docs.html 


Configuring the HP OpenView Server and Agents 
You can run the Data Analyzer on a Windows or on an OpenVMS system. 


If you run the Data Analyzer on a Windows system, follow these steps: 


1. Configure the HP OpenView server so that the Windows system is a 


configured node. 


2. Deploy the Availability Manager template, AvailMan, to the Windows system. 


The AvailMan template is stored under "Policy management \ Policies grouped 
by type" in the OpenView Operations window: 


HP OpenView\Operations Manager 


If you run the Data Analyzer on an OpenVMS system, follow these steps: 


1. Install and configure the HP-OpenView agents on the OpenVMS system 
according to the instructions in the document “About OpenVMS Managed 
Nodes,” which is a link on the HP OpenView Agents for OpenVMS Web page: 


http: //h71000.www7.hp.com/openvms/products/openvms_ovo_agent/index.html 
2. Deploy the Availability Manager template, AvailMan, to the OpenVMS 


system. 


7.7.2 Using HP OpenView on Your System 


On the OpenView server you can create or modify policies or templates of the 
Open Message Interface group to manipulate events that the Data Analyzer 
has escalated. For parameters or options fields the Data Analyzer sets, see 


Table 7-5. 


Table 7-5 Parameters and Option Fields Used with OpenView 


Parameter or Option Field 


Description 


<$MSG_APPL> 
<$MSG_OBJECT> 


<$MSG_GRP> 


<$MSG_SEV> 


<$MSG_TEXT> 


<$MSG_NODE> 
<$MSG_NODE_ NAME> 
<$OPTION(NODE)> 


<$OPTION(GROUP)> 
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Application: "AvailMan" (appears to be case 
sensitive) 


Object: 6-character event name (example: 
"HIBIOR") 


Group: Node originating the event (example: 
"CMOVEQ") 


Derived from <$OPTION(SEVERITY)> in 

the Data Analyzer; the Data Analyzer maps 
SEVERITY to NORMAL, WARNING, MINOR, 
MAJOR, CRITICAL 


Message text: Event description (example: 
"CMOVEQ buffered I/O rate is high") 


Node running AvailMan 


Node running AvailMan 


Node originating the event (example: 
"CMOVEQ") 


Group to which originating node belongs 
(example: "Debug cluster") 


(continued on next page) 
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and Option Fields Used with OpenView 


Parameter or Option Field 


Description 


<$OPTION(SEQUENCE_NUMBER)> 


<$OPTION(SEVERITY)> 
<$OPTION(EVENT)> 
<$OPTION(TIME)> 


AM internal event sequence number (example: 
" 14") 


AM event severity (0-100) (example: "60") 
6-character event name (example: "HIBIOR") 


Original time event posted (example: "15-Aug- 
2005 14:41:44.164") 


7.8 Customizing Events and User Notification of Events 


You can customize a number of characteristics of the events that are displayed in 
the Event pane of the System Overview window (Figure 2-25). You can also use 


customization options to notify 


users when specific events occur. 


When you select the Operating System -> Customize OpenVMS... or 
Operating System —> Customize Windows NT... from the System Overview 


window Customize menu, the 
the one shown in Figure 7-19. 


Data Analyzer displays a tabbed page similar to 


Figure 7-19 Event Customizations 


Customization - Open¥MS Default Settings EF 


peat 


| CFGDON, configuration done 


~| iii] Use default values | 


Event Customizations 


Severity 


i 


Escalation actions: [] User [] Opcom [1] HP OpenView 


Iser Act 


a]; 1 | 


Event explanation and investigation hints 
The Availability Manager has made a connec 


node and will start collecting data according to the customize data 


collection options selected. 


This is an informational event to indicate that 


recognized. No further investigation is required. 


tion to the data collection 


the node has been 


© Global OpenVMS 


| ok | cancer | Apply | Heir 


On OpenVMS systems, you can customize events at the operating system, 
group, or node level. On Windows systems, you you can customize events at the 


operating system or node level. 


Keep in mind that an event that you customize at the group level overrides the 
value set at a previous (higher) level (see Table 7-1). 
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7.8.1 Customizing Events 


You can change the values for any data that is available—that is, not dimmed—on 
this page. The following table describes the data you can change: 


Data Description 


Severity Controls the severity level at which events are displayed in the Event 
pane of the System Overview window (Figure 2-25). By default, all 
events are displayed. Increasing this value reduces the number of 
event messages in the Event pane of the System Overview window 
(Figure 2-25) and can improve perceived response time. 


Occurrence Each Availability Manager event is assigned an occurrence value, 
that is, the number of consecutive data samples that must exceed the 
event threshold before the event is signaled. By default, events have 
low occurrence values. However, you might find that a certain event 
indicates a problem only when it occurs repeatedly over an extended 
period of time. You can change the occurrence value assigned to that 
event so that the Data Analyzer signals the event only when necessary. 


For example, suppose page fault spikes are common in your 
environment, and the Data Analyzer frequently signals intermittent 
HITTLP, total page fault rate is high events. You could change the 
event’s occurrence value to 3, so that the total page fault rate must 
exceed the threshold for three consecutive collection intervals before 
being signaled to the event log. 


To avoid displaying insignificant events, you can customize an event so 
that the Data Analyzer signals it only when it occurs continuously. 


Threshold Most events are checked against only one threshold; however, some 
events have dual thresholds: the event is triggered if either one is true. 
For example, for the LOVLSP, node disk volume free space is low event, 
the Data Analyzer checks both of the following thresholds: 


e Number of blocks remaining 


e Percentage of total blocks remaining 


Escalation You can enter one or more of the following values: 


actions 
e User: If the event occurs, the Data Analyzer refers to the User 


Action field to determine what action to take. 


e OPCOM: If the event occurs, and certain conditions are met (see 
Section 7.7), the Data Analyzer passes that event to OPCOM. 
(Data Analyzer on OpenVMS only) 


e HP OpenView: If the event occurs and certain conditions are 
met (see Section 7.7), the Data Analyzer passes that event to HP 
OpenView. (OpenView agents must be installed and configured on 
the Data Analyzer node.) 


User Action When the Event escalation action field is set to User, User Action is no 
longer dimmed. You can enter the name of a procedure to be executed 
if the event displayed at the top of the page occurs. To use this field, 
see the instructions in Section 7.8.2. 


The “Event explanation and investigation hints” section of the Event 
Customizations page, which is not customizable, includes a description of the 
event displayed and suggestions for how to correct any problems that the event 
signals. 
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7.8.2 Entering a User Action 


Note 


OpenVMS and Windows execute the User Action procedure somewhat 
differently, as explained in the following paragraphs. 


The following notes pertain to writing and executing User Action commands or 
command procedures. These notes apply to User Actions on both OpenVMS and 
Windows systems. 


The procedure that you specify as the User Action is executed in the following 
manner: 


— It is issued to the operating system that is running the Data Analyzer. 


— Itis issued as a process separate from the one running the Data Analyzer 
to avoid affecting its operation. 


— It is run under the same account as the one running the Data Analyzer. 


User Actions are intended to execute procedures that do not require 
interactive displays or user input. 


You can enter User Actions for events on either a systemwide basis or a 
per-node basis: 


— Ona systemwide basis, the User Action is issued for an event that occurs 
on any node. 


— Ona per-node basis, the User Action is issued for an event that occurs 
only on a specific node. 


If event logging is enabled, the Data Analyzer writes events to the event log 
file (called AnalyzerEvents.log by default on OpenVMS systems and Windows 
systems). A status line matching the original line indicates whether the User 
Action was successfully issued. For example: 


AMGR/KOINE -- 13-Apr-2005 15:33:02.531 --<0,CFGDON>KOINE 
AMGR/KOINE -- 13-Apr-2005 15:33:02.531 --<0,CFGDON>KOINE 
(User Action issued for this event on the client 0/S) 


configuration done 
configuration done 


13 Be 3 | 


Other events might appear between the first logging and the status line. The 
log file does not indicate whether the User Action executed successfully. You 

must obtain the execution status from the operating system, for example, the 
OpenVMS batch procedure log. 


The User Action functionality might be enhanced in a future release of the 
Data Analyzer, but backward compatibility is not guaranteed for the format of 
User Action procedure strings or for the method of executing the procedures 
on a particular operating system. 
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7.8.2.1. Executing a Procedure on an OpenVMS System 


Enter the name of the procedure you want OpenVMS to execute (see Figure 7-19) 
after "User Action." Use the following format: 


disk:[directory |filename.COM 

where: 

— disk is the name of the disk where the procedure resides. 

— directory is the name of the directory where the procedure resides. 


—  filename.COM is the file name of the command procedure you want OpenVMS 
to execute. The file name must follow OpenVMS file-naming conventions. 


The User Action procedure must contain one or more DCL command statements 
that form a valid OpenVMS command procedure. 


The User Action procedure is passed as a string value to the DCL command 
interpreter as follows: 


SUBMIT/NOPRINTER/LOG user_action_procedure arg_1 arg_2 arg_3 arg_4 


where: 


e The first command is the DCL command SUBMIT with associated qualifiers. 
e user_action_procedure is a valid OpenVMS file name. 


e The arguments the Data Analyzer supplies to the User Action procedure are 
the following: 


Argument Description 

arg 1 Node name of the node that generated the event. 
arg 2 Date and time that the event was generated. 
arg_3 Name of the event. 

arg 4 Description of the event. 


The Data Analyzer does not interpret the string contents. You can supply 

any content in the User Action procedure that DCL accepts in the OpenVMS 
environment for the user account running the Data Analyzer. However, if you 
include arguments in the User Action procedure, they might displace or overwrite 
arguments that the Data Analyzer supplies. 


A suitable batch queue must be available on the Data Analyzer computer to be 
the target of the SUBMIT command. See the HP OpenVMS DCL Dictionary for 
the SUBMIT, INITIALIZE/QUEUE, and START/QUEUE commands for use of 
batch queues and the queue manager. 


An example of a DCL command procedure is: 
DISK$PAYROLL: [AM_COMS] DISK_OFFLINE.COM 
The contents of the DCL command procedure might be the following: 


$ if (p3.eqs."DSKOFF") .and. (pl.eqs."PAYROL") 

$ then 

$ mail/subject="''p2' ‘'p3' ‘'p4’" urgent_instructions.txt 
call_center, finance, adams 

$ else 

$  mail/subject="''p2' ''p3’ ''p4‘'" instructions.txt call_center 
$ endif 
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The pn numbers in the DCL procedure correspond in type, number, and position 
to the arguments in the preceding table. 


You might use a procedure like this one to notify several groups if the payroll disk 
goes off line, or to notify the call center if any other event occurs. 


7.8.2.2 Executing a Procedure on a Windows System 


Enter the name of the procedure you want Windows to execute using the following 
format: 


device: \ directory \ filename.BAT 

where: 

— device is the disk on which the procedure is located. 

— directory is the folder in which the procedure is located. 


— filename.BAT is the name of the command file to be executed. 


Notes 


The file name must follow Windows file-naming conventions. However, 
due to the processing of spaces in the Java JRE, HP recommends that you 
not use spaces in a path or file name. 


HP recommends that you use a batch file to process and call procedures 
and applications. 


The Data Analyzer passes the User Action procedure to the Windows command 
interpreter as a string value as follows: 


"AT time CMD/C user_action_procedure arg_1 arg_2 arg_3 arg_4" 


where: 


e AT is the Windows command that schedules commands and programs at a 
specified time and date. 


e The time substring is a short period of time— aproximately 2 minutes—in the 
future so that the AT utility processes the User Action procedure today rather 
than tomorrow. This is necessary because the AT utility cannot execute a 
procedure “now” rather than at an explicitly stated time. 


e user_action_procedure is a Windows command or valid file name. The file 
must contain one or more Windows command statements to form a valid 
command procedure. (See the example in this section.) 


e The arguments are listed in the following table: 


Argument Description 

arg 1 Node name of the node that generated the event. 
arg 2 Date and time that the event was generated. 

arg 3 Name of the event. 

arg 4 Description of the event. 


The Data Analyzer does not interpret the string contents. You can supply any 
content in the string that the Windows command-line interpreter accepts for the 
user account running the Data Analyzer. However, if you include arguments in 
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the User Action procedure, they might displace or overwrite arguments that the 
Data Analyzer supplies. 


You cannot specify positional command-line switches or arguments to the AT 
command, although you can include switches in the User Action procedure 
substring as qualifiers to the user-supplied command. This is a limitation of both 
the Windows command-line interpreter and the way the entire string is passed 
from the Data Analyzer to Windows. 


The Schedule service must be running on the Data Analyzer computer in order to 
use the AT command. However, the Schedule service does not run by default. To 
start the Schedule service, see the Windows documentation for instructions in the 
use of the CONTROL PANEL->SERVICES->SCHEDULE->|startup button]. 


Windows Example 
To set up a user action, follow these steps: 


1. Select an event on the Event Customizations page, for example, HIBIOR (see 
Figure 7-20). 


2. Change the Event escalation action to User. 
3. Enter the name of the program to run. For example: 


c:\send_message.bat 


Figure 7-20 User Action Example 


Customization - Open¥MS Default Settings 


(Evers EenteSeaaon 
l (Fitter 7 Security | 
HIPWIO, high paging write I/O rate v | Use default values | 
Event Customizations 
Severity | 80 —s([| Occurence [| = =2 sis 
Threshold | 5 _||2| Pagefile writes per second 


Escalation actions: [¥i User [¥i OPCOM [_] HP OpenView 


User Action |c\send_message.bat | Windows™ procedure 


Event explanation and investigation hints 
The average paging write 0 rate on the node exceeds the threshold. 
Use the Process I/O and Memory Summary to determine which 


processes are writing to the page file excessively, and decide whether 
their working sets need adjustment. 


© Global OpenVMS [ok | cancel | Apply | tii | 


The command line parameters are automatically added when the Data Analyzer 
passes the command to the command processor. 
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The contents of "send_message.bat" are the following: 
net send affcl7 "P4:system event: %1 %2 %3 %4" 


On the target node, AFFC17, a message similar to the following one is displayed: 


Messenger Service 


You can now apply the User Action to one node, all nodes, or a group of nodes, as 
explained in Section 7.8.2. 
7.9 Customizing Security Features 
The following sections explain how to change the following security features: 
e Passwords for groups and nodes 
e Data Analyzer passwords for OpenVMS and Windows Data Collector nodes 
e Security triplets on OpenVMS Data Collector nodes 


e Password on a Windows Data Collector node 


Note 


OpenVMS Data Collector nodes can have more than one password: each 
password is part of a security triplet. (Windows nodes allow you to have 
only one password per node.) 


7.9.1 Customizing Passwords for Groups and Nodes 


For both the Windows and OpenVMS Customization Pages at the operating 
system, group, or node level is a page similar to the one shown in Figure 7-6. It 
contains a tab labeled Security. If you select this tab on either system, the Data 
Analyzer displays a page similar to the one shown in Figure 7-21. 
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Figure 7-21 OpenVMS Security Customization 


Customization - Open¥MS Default Settings q Xx 
7 aa alan : 


(ET 
Collector Password | Use defaultvalues | 
1DECAMDS 


Enter an 8-character Data Collector password. 


© Global OpenVMS [ox | cance: | Anny | Hein | 


The level at which you can make password changes depends on whether you 
select the Security tab at the operating system, group, or node level. 


Changing Passwords at the Group Level 

If you monitor several groups, but the password for the nodes in one of those 
groups is different from the password for nodes in other groups, right-click the 
group you want to change, select Customize from the list, select the Security 
tab, and change the password. The new password is then used for each node that 
is a member of that group. 


Changing Passwords at the Node Level 

As a second example, to change the password of one node in a group to a 
different password than the other nodes in the group, right-click that node, select 
Customize from the list, select the Security tab, and change the password 

to one that differs from the other nodes in the group. For that node, the new 
password overrides the group password. 


In the second password example, if you want to set the password for the single 
node back to the password that the rest of the group uses, click Use default 
values. The password value for the node now comes from the group-level 
password setting. At this point, if you change the group password, all nodes 
in the group get the new password. Additional information about changing 
passwords for security is in Section 7.9. 


7.9.2 Changing Data Analyzer Passwords 


You can change the passwords that the Windows Data Analyzer uses for 
OpenVMS Data Collector nodes and for Windows Data Collector nodes. The 
following sections explain how to perform both actions. 
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7.9.2.1 Changing a Data Analyzer Password for an OpenVMS Data Collector Node 


When you select Customize OpenVMS... on the Customize menu of the System 
Overview window, the Data Analyzer displays a default customization page. On it 
is a tab marked Security, which, if you select it, displays the OpenVMS Security 
Customization page (Figure 7-21). 


To change the default password for the Data Analyzer to use to access OpenVMS 
Data Collector nodes, enter a password of exactly 8 uppercase alphanumeric 
characters. The Data Analyzer uses this password to access OpenVMS Data 
Collector nodes. This password must match the password that is part of the 
OpenVMS Data Collector security triplet (Section 1.3.3). 


When you are satisfied with your password, click OK. Exit the Data Analyzer and 
restart the application for the password to take effect. 


7.9.2.2 Changing a Data Analyzer Password for a Windows Data Collector Node 
When you select Customize Windows NT... on the Customize menu of the 


System Overview window, the Data Analyzer displays a Windows Security 
Customization page (Figure 7-22). 
Figure 7-22 Windows Security Customization 


Customization - Windows NT Default Settings yi xi 


Collector Password | Use default values | 


AvailMan 


Enter an 8-character Data Collector password. 


To change the default password for the Data Analyzer to use to access Windows 
Data Collector nodes, enter a password of exactly 8 alphanumeric characters. 
Note that this password is case sensitive; any time you type it, you must use the 
original capitalization. 


[| Global Windows NT 


This password must also match the password for the Windows Data Collector 
node that you want to access. (See Section 7.9.3 for instructions for changing that 
password.) 


When you are satisfied with your password, click OK. Exit and restart the Data 
Analyzer for the password to take effect. 
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7.9.3. Changing a Password on a Windows Data Collector 
To change the Data Collector password in the Registry, follow these steps: 


1. 


10. 
abe 
12. 
13. 


OOM NAA RP w 


Click the Windows Start button. On the menu displayed, first select 
Programs, then Accessories, and then Command Prompt. 


Type regedit after the angle prompt (>). 


The system displays a screen for the Registry Editor, with a list of entries 
under My Computer. 


On the list displayed, expand the HKEY_LOCAL_MACHINE entry. 
Double-click SYSTEM. 

Click CurrentControlSet. 

Click Services. 

Click damdrvyr. 

Click Parameters. 


Double-click Read Password. Then type a new 8-character alphanumeric 
password, and click OK to make the change. 


To store the new password, click Exit under File on the main menu bar. 
On the Control Panel, select Services and then Stop for “PerfServ.” 
Again on the Control Panel, select Devices and then Stop for “damdrvr.” 


First restart damdrvr under “Devices” and then restart PerfServ under 
“Services.” 


This step completes the change of your Data Collector password. 


7.10 Monitoring Processes on a Node 


As the Data Analyzer monitors all the processes on the system, you can configure 
the tool to notify you when particular processes are created or exit on your 
system. The Data Analyzer can watch up to eight processes on an individual 
node. This customization is available at the system, group or node level. (You 
cannot, however, use this feature to notify you about processes that should not be 
there.) 


When you bring up the Customization Page, it contains a tab labeled Watch 
Process. If you select this tab, the Data Analyzer displays the Watch Process 
page similar to the one shown in (Figure 7—23). 
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Figure 7-23 Process Watch 


Customization - Open¥MS Default Settings 


Watch Process 


© Process watch | il Use defaultvalues _| 


Processes to Watch Watch Process Configuration Explanation 


You can monitor up to eight processes on this 
node. Simply type in the process name in the 
box on the left. 


After you click on OK or Apply, Availability 
Manager will query this node for the 
processes listed. If any processes are not 
present on this node, a NOPROC is displayed 
in the event table on the main page. 


Ifa process was missing and then starts up, 
a PRCFND event is displayed in the event 
table. 


Please note that process names are 
case-sensitive. 


© Global OpenVMS [ok | cancer | Apnly | Hel | 


An explanation of the watch process feature is displayed on the right side of the 
page. You can enter up to 8 processes in the box on the left side of the page. 
After you enter process names, the Data Analyzer monitors these processes on 
the node you have selected. 


For a process that is not present on the node at the time you entered it on the 
Watch Process page, the Data Analyzer displays the following event in the Event 
pane of the System Overview window (Figure 2—25): 


NOPROC -- The process process-name has disappeared on 
the node node-name. 


If a process that a NOPROC event signalled reappears on the node, the Data 
Analyzer displays the following event in the Event pane of the System Overview 
window (Figure 2-25): 


PRCFND -- The process process-name has recently 
reappeared on the node node-name. 
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CPU Process States 


The CPU process states shown in Table A-1 are displayed in the OpenVMS CPU 
Process States page (Figure 3-8) and in the OpenVMS Process Information page 
(Figure 3-23). 


Table A-—1 CPU Process States 


Process 

State Description 

CEF Common Event Flag, waiting for a common event flag 

COLPG Collided Page Wait, involuntary wait state; likely to indicate a memory 
shortage, waiting for hard page faults 

COM Computable; ready to execute 

COMO Computable Outswapped, COM, but swapped out 

CUR Current, currently executing in a CPU 

FPG Free Page Wait, involuntary wait state; most likely indicates a memory 
shortage 

LEF Local Event Flag, waiting for a Local Event Flag 

LEFO Local Event Flag Outswapped; LEF, but outswapped 

HIB Hibernate, voluntary wait state requested by the process; it is inactive 

HIBO Hibernate Outswapped, hibernating but swapped out 
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Table A-1 (Cont.) CPU Process Siates 


Process 
State 


Description 


MWAIT 


PFW 


RWAST 


RWBRK 
RWCAP 
RWCLU 
RWCSV 
RWIMG 
RWLCK 
RWMBX 


RWMPB 
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Miscellaneous Resource Wait, involuntary wait state, possibly caused by a 
shortage of a systemwide resource, such as no page or swap file capacity or 
no synchronizations for single-threaded code. 


Types of MWAIT states are shown in the following table: 


MWAIT State Definition 

BWAIT Process waiting for buffered I/O byte count quota. 

JWAIT Process in either BWAIT or TWAIT state. 

TWAIT Process waiting for timer queue entry quota. 

EXH Kernel thread in exit handler (not currently used). 

IMODE Kernel thread waiting to acquire inner-mode semaphore. 

PSXFR Process waiting during a POSIX fork operation. 

RWAST Process waiting for system or special kernel mode AST. 

RWMBX Process waiting because mailbox is full. 

RWNBX Process waiting for nonpaged dynamic memory. 

RWPFF Process waiting because page file is full. 

RWPAG Process waiting for paged dynamic memory. 

RWMPE Process waiting because modified page list is empty. 

RWMPB Process waiting because modified page writer is busy. 

RWSCS Process waiting for distributed lock manager. 

RWCLU Process waiting because OpenVMS Cluster is in 
transition. 

RWCAP Process waiting for CPU that has its capability set. 

RWCSV Kernel thread waiting for request completion by 


OpenVMS Cluster server process. 


Page Fault Wait, involuntary wait state; possibly indicates a memory 
shortage, waiting for hard page faults. 


Resource Wait State, waiting for delivery of an asynchronous system trap 
(AST) that signals a resource availability; usually an I/O is outstanding or a 
process quota is exhausted. 


Resource Wait for BROADCAST to finish 
Resource Wait for CPU Capability 
Resource Wait for Cluster Transition 
Resource Wait for Cluster Server Process 
Resource Wait for Image Activation Lock 
Resource Wait for Lock ID data base 


Resource Wait on MailBox, either waiting for data in mailbox (to read) or 
waiting to place data (write) into a full mailbox (some other process has not 
read from it; mailbox is full so this process cannot write). 


Resource Wait for Modified Page writer Busy 
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Process 

State Description 

RWMPE Resource Wait for Modified Page list Empty 

RWNPG Resource Wait for Non Paged Pool 

RWPAG Resource Wait for Paged Pool 

RWPFF Resource Wait for Page File Full 

RWQUO Resource Wait for Pooled Quota 

RWSCS Resource Wait for System Communications Services 

RWSWP Resource Wait for Swap File space 

SUSP Suspended, wait state process placed into suspension; it can be resumed at 
the request of an external process 

SUSPO Suspended Outswapped, suspended but swapped out 
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Tables of Events 


This appendix contains the following tables of events: 


e OpenVMS Events Table B-1 
e Windows Events Table B—2 


Each table provides the following information: 


e Alphabetical list of the events that the Availability Manager Data Analyzer 
signals in the Event pane of the System Overview window (Figure 1-1) 


e Abbreviation and brief description of each event (also displayed in the Event 


pane) 


e Explanation of the event and a suggestion for remedial action, if applicable 


Table B—1 OpenVMS Events 

Event Description Explanation Recommended Action 

CFGDON - Configuration The Availabilty Manager has made This informational event indicates that 
done a connection to the data collection the node is recognized. No further 

node and will start collecting data investigation is required. 
according to the customize data 
collection options selected. 

CHGMAC Changed "The Availabilty Manager has This is an informational event to indicate 
MAC changed the MAC address used to that the MAC address used to with the 
address communicate with the node. node has changed. This is usually done 

when a multicase Hello packet has a 
different MAC address. OpenVMS nodes 
may have the MAC address changed 

for a number of reasons which include 
starting DECnet. 

DCCOLT Data Specifies the amount of time the This event records the amount of time a 
collection specified data collection took to data collection has taken, and is thrown 
completed complete. if the data collection took longer than 


the collection interval. This event is 
thrown along with the DCSLOW event 
to document the actual data collection 
rates as compared to the data collection 
intervals under Customize. 
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Event Description Explanation 


Recommended Action 


DCSLOW Data The specified data collection is 
collection taking longer to complete than the 
taking data collection interval. 
longer than 
collection 
interval 


DPGERR Error The Data Collector has detected a 
executing program error while executing the 
driver data collection program. 
program 


DSKERR High disk The error count for the disk device 
error count exceeds the threshold. 


DSKINV __ Disk is The valid bit in the disk device 
invalid status field is not set. The disk 
device is not considered valid by 
the operating system. 
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This event usually occurs where the 
connection between the Data Analyzer 
and the Data Collector is slow or is 
carrying a large amount of network 
traffic. Check the associated DCCOLT 
event for the actual time the data 
collection took. Some possible actions 
are: 


e lower the number of nodes monitored 


e lower the amount of data collected 
for each node 


e increase the data collection interval 
for less important data 


The data collections that can take a 
while to complete are: 


e CPU process, Memory or I/O for 
systems with many processes 


e Disk Status and Disk Volume for 
systems with many disks 


e Lock Contention data for systems 
with a large resource hash table size 
or a large number of resources to 
scan 


This event can occur if you have a bad 
driver program library, or there is a bug 
in the driver program. Make sure you 
have the program library that shipped 
with the kit; if it is correct, contact your 
customer support representative with the 
full text of the event. 


Check error log entries for device errors. 
A disk device with a high error count 
could indicate a problem with the disk 
or with the connection between the disk 
and the system. 


Make sure that the disk device is valid 
and is known to the operating system. 
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Event Description Explanation Recommended Action 
DSKMNV _ Disk in The disk device is performing a The system is performing a mount 
mount verify mount verification. verification for the disk device. This 
state could be caused by: 
e A removable disk on a local or 
remote node was removed. 
e A disk on a local or remote node has 
gone offline due to errors. 
e The node that serves the disk is 
down. 
e The connection to a remote disk is 
down. 
DSKOFF __ Disk device The disk device has been placed in Check whether the disk device should 
is off line the off line state. be off line. This event is also signalled 
when the same device name is used for 
two different physical disks. The volume 
name in the event is the second node to 
use the same device name. 
DSKQLN High disk The average number of pending More I/O requests are being queued 
queue length I/Os to the disk device exceeds the __ to the disk device than the device can 
threshold. service. Reasons include a slow disk or 
too much work being done on the disk. 
DSKRWT High disk The RWAIT count on the disk RWAIT is an indicator that an I/O 
RWAIT device exceeds the threshold. operation has stalled, usually during 
count normal connection failure recovery 
or volume processing of host-based 
shadowing. A node has probably failed 
and shadowing is recovering data. 
DSKUNA __ Disk The disk device has been placed in The disk device state has been set to 
device is the Unavailable state. /NOAVAILABLE. See DCL help for the 
unavailable SET DEVICE/AVAILABLE command. 
DSKWRV_ Wrong The disk device has been mounted Set the correct volume name by 
volume with the wrong volume label. entering the DCL command SET 
mounted VOLUME/LABEL on the node. 
ELIBCR Bad CRC for The CRC calculation for the The exportable program library may be 
exportable exportable program library does corrupt. Restore the exportable program 
program not match the CRC value in the library from its original source. 
library library. 
ELIBNP No privilege | Unable to access the exportable Check to make sure that the Data 
to access program library. Analyzer has the proper security access 
exportable to the exportable program library file. 
program 
library 
ELIBUR Unable Unable to read the exportable The exportable program library may be 
to read program library for the corrupt. Restore the exportable program 
exportable combination of hardware library from its original source. 
program architecture and OpenVMS 
library version. 
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Event Description Explanation Recommended Action 

FXBRCT _ Fix context The Data Analyzer tried to This event could occur if there is network 
does not perform a fix, but the fix context congestion or some problem with the 
exist on node on the node does not exist. The node. Confirm the connection to the 

context holds the original request node, and reapply the fix if necessary. 
and any response output for the 
request. 

FXCPKT Received a The Data Analyzer tried to This event could occur if there is network 
corrupt fix perform a fix, but the fix congestion or some problem with the 
response acknowledgment from the node node. Confirm the connection to the 
packet from was corrupt. node, and reapply the fix if necessary. 
node 

FXCRSH Crash node The Data Analyzer has This informational message indicates a 
fix successfully performed a Crash successful fix. Expect to see a Path Lost 

Node fix on the node. event for the node. 

FXDCPR Decrement The Data Analyzer has This informational message indicates a 
process successfully performed a successful fix. Setting a process priority 
priority fix Decrement Process Priority fix too low takes CPU time away from the 

on the process. process. 

FXDCWS Decrement The Data Analyzer has This informational message indicates 
process successfully decreased the working a successful fix. This fix disables the 
working set set size of the process on the node automatic working set adjustment for the 
size fix by performing an Adjust Working __ process. 

Set fix. 

FXDLPR Delete The Data Analyzer has This informational message indicates a 
process successfully performed a Delete successful fix. If the process is in RWAST 
fix Process fix on the process. state, this fix does not work. This fix also 

does not work on processes created with 
the no delete option. 

FXEXIT Exit image The Data Analyzer has This informational message indicates a 
fix successfully performed an Exit successful fix. Forcing a system process 

Image fix on the process. to exit its current image can corrupt the 
kernel. 

FXINPR Increment The Data Analyzer has This informational message indicates a 
process successfully performed an successful fix. Setting a process priority 
priority fix Increment Process Priority fix too high takes CPU time away from 

on the process. other processes. Set the priority above 
15 only for “real-time” processing. 

FXINQU Increment The Data Analyzer has This informational message indicates a 
process successfully increased the quota successful fix. This fix is only for the life 
quota limits limit of the process on the node by _ of the process. If the problem continues, 
fix placing a new limit value in the change the limit for the account in the 

limit field of the quota. UAF file. 

FXINWS Increment The Data Analyzer has This informational message indicates 
process successfully increased the working a successful fix. This fix disables the 
working set set size of the process on the node automatic working set adjustment for the 
size fix by performing an Adjust Working _ process. The adjusted working set value 
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Set fix. 


cannot exceed WSQUOTA for the process 
or WSMAX for the system. 
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Event Description Explanation Recommended Action 

FXKERR Error The Availabilty Manager tried to The error messages is from an error 
executing perform a fix, but the fix failed for status returned from the fix. The event 
fix the specified reason. text will also be recorded in the Event 

Log. 

FXMVDV_ Cancel The Availabilty Manager has This is an informational message to 
Mount Verify successfully performed a Cancel indicate a successful fix. The disk 
on Disk Mount Verify on Disk Volume fix volume can now be dismounted by a 
Volume on the process. $ DISMOUNT/ABORT command. 

FXMVSM Cancel The Availabilty Manager has This is an informational message to 
Mount Verify successfully performed a Cancel indicate a successful fix. The shadow 
on Shadow Mount Verify on Shadow Set set member is ejected from the shadow 
Set Member Member fix on the process. set automatically, and is the equivalent 

of $ SET SHADOW/FORCE_REMOVAL 
command. 

FXNOPR  No-change The Data Analyzer has This informational message indicates a 
process successfully performed a Process successful fix. The Fix Value slider was 
priority fix Priority fix on the process that set to the current priority of the process. 

resulted in no change to the 
process priority. 

FXNOQU  No-change The Data Analyzer has This informational message indicates a 
process successfully performed a quota successful fix. The Fix Value slider was 
quota limits _ limit fix for the process that set to the current quota of the process. 
fix resulted in no change to the quota 

limit. 

FXNOWS  No-change The Data Analyzer has This informational message indicates a 
process successfully performed Adjust successful fix. The Fix Value slider was 
working set Working Set fix on the process. set to the current working set size of the 
size fix process. 

FXPGWS_—s— Purge The Data Analyzer has This informational message indicates a 
working successfully performed a Purge successful fix. The purged process might 
set fix Working Set fix on the process. page fault to retrieve memory it needs 

for current processing. 

FXPRIV No privilege |The Data Analyzer cannot perform See Chapter 7 for details about setting 
to attempt a fix on the node due either up security. 
fix to no CMKRNL privilege or to 

unmatched security triplets. 

FXQUOR = Adjust The Data Analyzer has This informational message indicates a 
quorum successfully performed an Adjust successful fix. Use this fix when you find 
fix Quorum fix on the node. many processes in RWCAP state on a 

cluster node. 

FXRESM Resume The Data Analyzer has This informational message indicates a 
process fix successfully performed a Resume successful fix. If the process goes back 

Process fix on the process. into suspend state, check the AUDIT_ 
SERVER process for problems. 

FXSUSP = Suspend The Data Analyzer has This informational message indicates a 

process fix successfully performed a Suspend _ successful fix. Do not suspend system 


Process fix on the process. 


processes. 
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Event Description Explanation Recommended Action 
FXTIMO _ Fix timeout The Data Analyzer tried This event can occur if there is network 
to perform a fix, but no congestion, if some problem is causing 
acknowledgment for the fix was the node not to respond, or if the fix 
received from the node within the _request failed to reach the node. Confirm 
timeout period. the connection to the node, and reapply 
the fix if necessary. 

FXUERR Unknown The Data Analyzer tried to Please contact your HP customer support 
error code perform a fix, but the fix failed representative with the text of this 
for fix for an unexpected reason. event. The event text is also recorded in 

the event log. 

HIALNR-_— High The node’s average alignment Alignment faults are executable images 
alignment fault rate exceeds the threshold. accessing data that is not naturally 
fault rate aligned (the address of the data field 

is not evenly divisible by the size of 

the data field). Alignment faults must 
be processed by the system, and slow 
performance. Use FLT commands of the 
$ ANALYZE/SYSTEM command to find 
out the origin of the faults. If possible, 
obtain newer versions of the applications 
causing the faults. 

HIBIOR High The node’s average buffered I/O A high buffered I/O rate can cause high 
buffered rate exceeds the threshold. system overhead. If this is affecting 
V/O rate overall system performance, use the I/O 

Summary to determine the high buffered 
I/O processes, and adjust their priorities 
or suspend them as needed. 

HICMOQ Many The average number of processes Use the CPU Process Summary to 
processes on the node in the COMO queue determine which processes are competing 
in COMO exceeds the threshold. for CPU and memory resources. Possible 
state waiting adjustments include changing process 
for CPU priorities and suspending processes. 

HICOMQ Many The average number of processes Use the CPU Mode Summary to 
processes on the node in the COM queue determine which processes are competing 
waiting in exceeds the threshold. for CPU resources. Possible adjustments 
COM state include changing process priorities and 
waiting for a suspending processes. 

CPU 

HIDIOR High direct The average direct I/O rate on the A high direct I/O rate can cause high 

V/O rate node exceeds the threshold. system overhead. If this is affecting 
overall system performance, use the I/O 
Summary to determine the high direct 
I/O processes, and adjust their priorities 
or suspend them as needed. 

HIHRDP _—_ High hard The average hard page fault rate A high hard page fault indicates that the 
page fault on the node exceeds the threshold. free or modified page list is too small. 
rate Check Chapter 7 for possible actions. 

HIMWTQ Many The average number of processes Use the CPU and Single Process pages 
processes on the node in the Miscellaneous to determine which resource is awaited. 
waiting in Resource Wait (MWAIT) queues See Chapter 7 for more information 
MWAIT exceeds the threshold. about wait states. 


B-6 Tables of Events 


(continued on next page) 


Table B-1 (Cont.) OpenVMS Events 


Tables of Events 


Event Description Explanation Recommended Action 

HINTER High The average percentage of time Consistently high interrupt time 
interrupt the node spends in interrupt mode prohibits processes from obtaining CPU 
mode time exceeds the threshold. time. Determine which device or devices 

are overusing this mode. 

HIPFWQ Many The average number of processes Use the CPU Process States and Memory 
processes on the node that are waiting to Summary to determine which processes 
waiting in page in more memory exceeds the are in the PFW state. PFW processes 
PFW state threshold. could be constrained by too little physical 

memory, too restrictive working set 
quotas, or lack of available page file 
space. 

HIPINT High The average percentage of time Consistently high interrupt time on the 
interrupt the node spends in interrupt mode Primary CPU can slow down IO and 
mode time exceeds the threshold. servicing various systems in OpenVMS. 
on Primary Enabling Fast Path helps distribute the 
CPU servicing of interrupts from IO among 

the CPUs on the node. Also, determine 
which device or devices are overusing 
this mode. 

HIPRCT High process The proportion of actual processes Decrease the number of actual 
count to maximum processes is processes. Increase SYSGEN parameter 

too high. If the number of MAXPROCESSCNT. 
processes reaches the maximum 

(MAXPROCESSCNT), no more 

processes can be created and the 

system might hang as a result. 

HIPWIO High paging The average paging write I/O rate Use the Process I/O and Memory 
write I/O on the node exceeds the threshold. Summary pages to determine which 
rate processes are writing to the page file 

excessively, and decide whether their 
working sets need adjustment. 

HIPWTQ Many The average number of processes Use the CPU Process States and Memory 
processes on the node that are waiting Summary to determine which processes 
waiting in for page file space exceeds the are in the COLPG or FPG state. COLPG 
COLPG or threshold. processes might be constrained by too 
FPG little physical memory, too restrictive 

working set quotas, or lack of available 
page file space. FPG processes indicate 
too little physical memory is available. 

HISYSP High system The node’s average page fault rate These are page faults from pageable 
page fault for pageable system areas exceeds _ sections in loadable executive images, 
rate the threshold. page pool, and the global page table. The 

system parameter SYSMWCNT might 
be set too low. Use AUTOGEN to adjust 
this parameter. 

HITTLP High total The average total page fault rate Use the Memory Summary to find the 
page fault on the node exceeds the threshold. page faulting processes, and make sure 
rate that their working sets are set properly. 
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Event Description Explanation Recommended Action 
HMPSYN High mul- The average percentage of time High synchronization time prevents 
tiprocessor the node handles multiprocessor other devices and processes from 
(MP) syn- (MP) synchronization exceeds the obtaining CPU time. Determine which 
chronization — threshold. device is overusing this mode. 
mode time 
HPMPSN- High MP The average percentage of time High synchronization time prevents 
synchro- the node handles multiprocessor other devices and processes from 
nization (MP) synchronization exceeds the obtaining CPU time. This is especially 
mode time threshold. critical for the Primary CPU, which is 
on Primary the only CPU that performs certain tasks 
CPU on OpenVMS. Determine which spinlocks 
are overusing this mode. Executing 
SYS$EXAMPLES:SPL.COM shows 
which spinlocks are being used. 
KTHIMD _ Kernel The average percentage of time Use SDA to determine which kernel 
thread that the kernel thread waits for thread of the process has the semaphore. 
waiting for the inner-mode semaphore exceeds 
inner-mode the threshold. 
semaphore 
LCKBLK Lock The process holds the highest Use the Single Process Windows to 
blocking priority lock in the resource’s determine what the process is doing. If 


granted lock queue. This lock 
is blocking all other locks from 
gaining access to the resource. 


LCKCNT Lock The resource has a contention 
contention situation, with multiple locks 
competing for the same resource. 
The competing locks are the 
currently granted lock and those 
that are waiting in the conversion 
queue or in the waiting queue. 


LCKWAT Lock waiting The process that has access to the 
resource is blocking the process 
that is waiting for it. Once the 
blocking process releases its 
access, the next highest lock 
request acquires the blocking lock. 


LOASTQ Process has Either the remaining number 


used most of asynchronous system traps 
of ASTLM (ASTs) the process can request 
quota is below the threshold, or the 


percentage of ASTs used compared 
to the allowed quota is above the 
threshold. 


LOBIOQ Process has Either the remaining number of 
used most Buffered I/Os (BIO) the process 
of BIOLM can request is below the threshold, 
quota or the percentage of BIOs used is 
above the threshold. 
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the process is in an RWxxx state, try 
exiting the image or deleting the process. 
If this fails, crashing the blocking node 
might be the only other fix option. 


Use Lock Contention to investigate a 
potential lock contention situation. Locks 
for the same resource might have the 
NODLCKWT wait flag enabled and be 
on every member of the cluster. Usually 
this is not a lock contention situation, 
and these locks can be filtered out. 


If the blocking process holds the resource 
too long, check to see whether the process 
is working correctly; if not, one of the 
fixes might solve the problem. 


If the amount used reaches the quota, 
the process enters RWAST state. If 

the process requires a higher quota, 

you can increase the ASTLM quota for 
the process in the UAF file. ASTLM is 
only a count; system resources are not 
compromised by increasing this count. 


If the amount used reaches the quota, 
the process enters RWAST state. If the 
process requires a higher quota, you 
can increasing the BIOLM quota for 
the process in the UAF file. BIOLM is 
only a count; system resources are not 
compromised by increasing this count. 
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LOBYTQ Process has Either the remaining number of If the amount used reaches the quota, 
used most bytes for the buffered I/O byte the process enters RWAST state. If the 
of BYTLM count (BYTCNT) that the process process requires a higher quota, you can 
quota can request is below the threshold, raise the BYTLM quota for the process in 

or the percentage of bytes used is the UAF file. BYTLM is the number of 
above the threshold. bytes in nonpaged pool used for buffered 
Vo. 

LODIOQ Process has Either the remaining number of If the amount used reaches the quota, 
used most Direct I/Os (DIOs) the process can __ the process enters RWAST state. If 
of DIOLM request is below the threshold, or _ the process requires a higher quota, 
quota the percentage of DIOs used is you can increase the DIOLM quota for 

above the threshold. the process in the UAF file. DIOLM is 
only a count; system resources are not 
compromised by increasing this count. 

LOENQU Process has Either the remaining number of If the limit reaches the quota, the process 
used most lock enqueues (ENQ) the process is not able to make further lock queue 
of ENQLM can request is below the threshold, requests. If the process requires a higher 
quota or the percentage of ENQs used is quota, you can increase the ENQLM 

above the threshold. quota for the process in the UAF file. 

LOFILQ Process has Either the remaining number of If the amount used reaches the quota, 
used most files the process can open is below the process must first close some files 
of FILLM the threshold, or the percentage of before being allowed to open new ones. If 
quota files open is above the threshold. the process requires a higher quota, you 

can increase the FILLM quota for the 
process in the UAF file. 

LOMEMY Free memory For the node, the percentage of Use the automatic Purge Working Set fix, 
is low free memory compared to total or use the Memory and CPU Summary to 

memory is below the threshold. select processes that that are either not 
currently executing or not page faulting, 
and purge their working sets. 

LOPGFQ Process has Either the remaining number of If the process requires a higher quota, 
used most of pages the process can allocate you can raise the PGFLQUOTA quota for 
PGFLQUOTA from the system page file is below _ the process in the UAF file. This value 
quota the threshold, or the percentage limits the number of pages in the system 

of pages allocated is above the page file that the account’s processes can 
threshold. use. 

LOPGSP Low page file Either the remaining number Either extend the size of this page file 
space of pages in the system page file or create a new page file to allow new 

is below the threshold, or the processes to use the new page file. 
percentage of page file space 
remaining is below the threshold. 

LOPRCQ Process has Either the remaining number of If the amount used reaches the quota, 
used most subprocesses the current process the process is not allowed to create more 
of PRCLM is allowed to create is below the subprocesses. If the process requires 
quota threshold, or the percentage of a higher quota, you can increase the 

created subprocesses is above the PRCLM quota for the process in the UAF 
threshold. file. 

LOSTVC Lost virtual The virtual circuit between the Check to see whether the second 


circuit to 
node 


listed nodes has been lost. 


node listed has failed or whether the 
connection between the nodes is broken. 
The VC name listed in parentheses is the 
communication link between the nodes. 
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LOSWSP Low swap Either the remaining number Either increase the size of this page file, 
file space of pages in the system page file or create a new page file to allow new 

is below the threshold, or the processes to use the new page file. 
percentage of page file space 
remaining is below the threshold. 

LOTQEQ Process has Either the remaining number of If the amount used reaches the quota, 
used most Timer Queue Entries (TQEs) the the process enters RWAST state. If the 
of TQELM process can request is below the process requires a higher quota, you can 
quota threshold, or the percentage of raise the TQELM quota for the process 

TQEs used to the allowed quota is in the UAF file. TQELM is only a count; 
above the threshold. system resources are not compromised by 
raising it. 

LOVLSP Low disk The remaining number of blocks You must free up some disk volume 
volume free on the volume is below the first space. If your intention is that the 
space threshold, or the percentage of free volume be filled, such as a disk dedicated 

blocks remaining on the volume is _ to page or swap files, then you can filter 
below the second threshold. that volume from the display. 
— Note — 

For accurate evaluation of this 
event in an OpenVMS cluster, 
Disk Volume data must be 
collected over the entire cluster. 
This requirement is necessary 
because the free space for a disk 
volume is stored in a lock resource 
on only one of the nodes in the 
cluster, so all the nodes must be 
examined to find the free space. 
To enable disk volume data 
collection over all the nodes in 
the cluster, right-click on the 
AMDS group name for the cluster 
in the System Overview pane 

and click on Customize..., or at 
the OpenVMS level by clicking 

on the Customize menu item in 
the System Overview pane. In 
the Data Collection tab, check 

the Collect checkbox for the Disk 
Volume data collection to enable 
background data collection. 

LOVOTE Low cluster The difference between the Check to see whether voting members 
votes number of VOTES and the have failed. To avoid the hang that 

QUORUM in the cluster is below results if VOTES goes below QUORUM, 
the threshold. use the Adjust Quorum fix. 

LOWEXT Low process _ The process page fault rate This event indicates that the 


working set 
extent 
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exceeds the threshold, and the 
percentage of working set size 
compared to working set extent 
exceeds the threshold. 


WSEXTENT value in the UAF file 
might be too low. The process needs 
more physical memory but cannot obtain 
it; therefore, the process page faults 
excessively. 
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Event Description Explanation Recommended Action 

LOWSQU__Low process ___ The process page fault rate This event indicates the process needs 
working set exceeds the threshold, and the more memory but might not be able to 
quota percentage of working set size obtain it because one of the following is 

exceeds the threshold. true: 

e The WSQUOTA value in the UAF 
file is set too low for the size of 
memory allocation requests or 

e The system is memory constrained. 

LRGHSH Remote lock The Data Analyzer cannot This event indicates that the Data 
hash table investigate the node’s resource Analyzer will take too many collection 
too large to hash table (RESHASHTBL). It is iterations to analyze lock contention 
collect data either too sparse or too dense to situations efficiently. Make sure that the 
on investigate efficiently. SYSGEN parameter RESHASHTBL is 

set properly for the node. 

MINCAP Capability The capability version of the Install the current version of the Data 
version OpenVMS Data Collector is below Collector on the OpenVMS system. 
below the version required by the Data 
minimum Analyzer. 
required 

NEWMAC Discovered The Availabilty Manager has This is an informational event to indicate 
new MAC discovered a new MAC address for that a new MAC has been discovered for 
address the node. the node. This is usually done when a 

multicase Hello packet has a different 

MAC address. OpenVMS nodes may 

have the MAC address changed for a 

number of reasons which include starting 

DECnet. 

NOPGFL No page file The Data Analyzer cannot find a Use SYSGEN to create and connect a 

page file on the node. page file on the node. 

NOPLIB No program The program library for the Check to see that all the program 
library combination of hardware library files exist in the program library 

architecture and OpenVMS directory. 
version was not found. 

NOPRIV Not allowed The Data Analyzer cannot monitor See Chapter 7 for details on setting up 
to monitor the node due to unmatched security. 
node security triplets. 

NOPROC _ Specific The Data Analyzer cannot find This event can occur because the listed 
process not the process name selected in the process no longer exists, or the process 
found Process Name Search dialog box name is listed incorrectly in the dialog 

on the Node Summary page. box. 

NOSWFL Noswap file The Data Analyzer cannot find a If you do not use swap files, you can 


swap file on the node. 


ignore this event. Otherwise, use 
SYSGEN to create and connect a swap 
file for the node. 
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Event Description Explanation Recommended Action 

OPCERR _ Event not Either the Data Analyzer was A text message in the status field 
sent to unable to send the event to indicates that the Data Analyzer was not 
OPCOM OPCOM because of a setup configured properly, including missing 


OVOERR ~ Event not 


PKTCER 


PKTFER 


PLIBNP 


PLIBUR 


sent to 
OpenView 


Packet 
checksum 
error 


Packet 
format error 


No privilege 
to access 
program 
library 
Unable 

to read 
program 
library 
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problem, or an error was returned 


by OPCOM. 


The Data Analyzer was unable to 
send the event to OpenView. 


The data packet sent to the remote 
node was not received correctly 
and failed to pass checksum 
verification. 


The data packet sent to the remote 
node was not in the correct format 
for the remote node to process. 


Unable to access the program 
library. 


Unable to read the program 
library for the combination 
of hardware architecture and 
OpenVMS version. 


shareable images or incorrectly defined 
logical names. 


A hexadecimal condition value in the 
status field indicates the reason that 
OPCOM was not able to post the event. 
The $SNDOPR system service returns 
this value. For a list of condition values 
and additional information, see the HP 
OpenVMS System Services Reference 
Manual. 


The reason is stated in the event 
description in the Event pane. Problems 
can include the following: 


e The Data Analyzer was not 
configured properly, including 
missing shareable images or 
incorrectly defined logical names. 


e An HP OpenView policy or template 
might not have been deployed 
properly. 


e A problem occurred communicating 
to or within OpenView. 


e The user does not have sufficient 
privileges or quotas, or both. 


e Too many events are waiting to be 
escalated by OpenView. 


The data packet was corrupted when it 
was received at the remote node. The 
most likely cause is a network hardware 
failure. 


Please contact your HP customer support 
representative with the full text of the 
event, the version of the Availability 
Manager, the configuration of the 

node running the Data Analyzer, and 
the configuration of the nodes being 
monitored. 


Check to see that the Availability 
Manager has the proper security access 
to the program library file. 


The program library is either corrupt 
or from a different version of the 
Availability Manager. Restore 

the program library from the last 
installation. 
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PRBIOR High process The average buffered I/O rate of If the buffered I/O rate is affecting 
buffered I/O _ the process exceeds the threshold. _ overall system performance, lowering 
rate the process priority or suspending the 

process would allow other processes to 
obtain access to the CPU. 

PRBIOW _ Process The average percentage of time Use SDA on the node to ensure that the 
waiting for the process is waiting for a device to which the process is performing 
buffered I/O buffered I/O to complete exceeds buffered I/Os is still available and is not 

the threshold. being overused. 

PRCCMO Process The average number of processes Use the CPU Process Summary to 
waiting in on the node in the COMO queue determine which processes should be 
COMO exceeds the threshold. given more CPU time, and adjust process 

priorities and states accordingly. Use the 
Memory Summary to determine which 
processes should have memory reduced 
or suspended and outswapped to free 
memory. 

PRCCOM Process The average number of processes Use the CPU Summary to determine 
waiting in on the node in the COM queue which processes should be given more 
COM state exceeds the threshold. CPU time, and adjust process priorities 

and states accordingly. 

PRCCUR Process has The average percentage of time Make sure that the listed process is not 
a high CPU the process is currently executing looping or preventing other processes 
rate in the CPU exceeds the threshold. from gaining access to the CPU. Adjust 

process priority or state as needed. 

PRCFND Process has The Data Analyzer has discovered No action required. 
recently been the process name selected on 
found the Watch Process page (see 

Figure 7-23). 

PRCMUT Process The average percentage of time Use SDA to help determine which mutex 
waiting fora _ the process is waiting for a the process is waiting for and to help 
mutex particular system mutex exceeds determine the owner of the mutex. 

the threshold. 

PRCMWT Process The average percentage of time Various resource wait states are part of 
waiting in the process is in a Miscellaneous the collective wait state called MWAIT. 
MWAIT Resource Wait (MWAIT) state See Appendix A for a list of these states. 

exceeds the threshold. The CPU Process page and the Single 
Process page display which state the 
process is in. Check the Single Process 
page to determine which resource the 
process is waiting for and whether the 
resource is still available for the process. 

PRCPSX Process The average percentage of time 
waiting in the process waits during a 
PSXFR POSIX fork operation exceeds 

the threshold. 

PRCPUL = Most of The remaining CPU time available Make sure the CPU time allowed for the 
CPULIM for the process is below the process is sufficient for its processing 
process threshold. needs. If not, increase the CPU quota in 
quota used the UAF file of the node. 
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PRCPWT Process The average percentage of time Check to make sure the system page 
waiting in the process is waiting to access file is large enough for all the resource 
COLPG, the system page file database requests being made. 

PFW or FPG _ exceeds the threshold. 

PRCQUO Process The average percentage of time Use the Single Process pages to 
waiting fora __ the process is waiting for a determine which quota is too low. Then 
quota particular quota exceeds the adjust the quotas of the account in the 

threshold. UAF file. 

PRCRWA Process The average percentage of Use the Single Process pages to 
waiting in time the process is waiting in determine if RWAST is due to the process 
RWAST the RWAST state exceeds the quota being set too low. If not, use 

threshold. RWAST indicates SDA to determine if RWAST is due to 
the process is waiting for an a problem between the process and a 
asynchronous system trap to physical device. 

complete. 

PRCRWC Process The average percentage of When many processes are in this state, 
waiting in time the process is waiting in the system might be hung because not 
RWCAP the RWCAP state exceeds the enough nodes are running in the cluster 

threshold. RWCAP indicates that to maintain the cluster quorum. Use 
the process is waiting for CPU the Adjust Quorum fix to correct the 
capability. problem. 

PRCRWM Process The average percentage of time Use SDA to help determine which 
waiting in the process is waiting in the mailbox the process is waiting for. 
RWMBX RWMBkxX state exceeds the 

threshold. RWMBX indicates 
the process is waiting for a full 
mailbox to be empty. 

PRCRWP Process The average percentage of time Processes in the RWPAG or RWNPG 
waiting in the process is waiting in the state can indicate you need to increase 
RWPAG, RWPAG, RWNPG, RWMPE, the size of paged or nonpaged pool, 
RWNPG, or RWMPB state exceeds the respectively. Processes in the RWMPB 
RWMPE, or threshold. RWPAG and RWNPG state indicate that the modified page 
RWMPB are for paged or nonpaged pool; writer cannot handle all the modified 

RWMPE and RWMPPB are for the pages being generated. See Chapter 7 for 
modified page list. suggestions. 

PRCRWS Process The average percentage of time Use the Show Cluster utility to help 
waiting in the process is waiting in the investigate. 

RWSCS, RWSCS, RWCLU, or RWCSV state 

RWCLU, or exceeds the threshold. RWCSV is 

RWCSV for the cluster server; RWCLU is 
for the cluster transition; RWSCS 
is for cluster communications. The 
process is waiting for a cluster 
event to complete. 

PRCUNK Process The average percentage of time The state in which the process is waiting 
waiting for the process is waiting for an is unknown to the Data Analyzer. 
a system undetermined system resource 
resource exceeds the threshold. 

PRDIOR High process The average direct I/O rate of the _If the I/O rate is affecting overall system 
direct I/O process exceeds the threshold. performance, lowering the process 
rate priority might allow other processes 
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to obtain access to the CPU. 
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PRDIOW Process The average percentage of time Use SDA on the node to ensure that the 
waiting for the process is waiting for a device to which the process is performing 
direct I/O direct I/O to complete exceeds direct I/Os is still available and is not 

the threshold. being overused. 

PRLCKW Process The average percentage of time The control wait state indicates that a 
waiting fora the process is waiting in the process is waiting for a lock. Although no 
lock control wait state exceeds the locks might appear in Lock Contention, 

threshold. the awaited lock might be filtered out of 


PRPGFL High process The average page fault rate of the 
page fault process exceeds the threshold. 
rate 


PRPIOR High process The average page read I/O rate of 
paging I/O the process exceeds the threshold. 


rate 
PTHLST Path lost The connection between the 
Availabilty Manager and the 
data collection node has been lost. 
RESDNS _ Resource The percentage of occupied entries 
hash table in the hash table exceeds the 
dense threshold. 


the display. 


The process is memory constrained; it 
needs an increased number of pages 
to perform well. Make sure that the 
working set quotas and extents are 
set correctly. To increase the working 
set quota temporarily, use the Adjust 
Working Set fix. 


The process needs an increased number 
of pages to perform well. Make sure that 
the working set quotas and extents are 
set correctly. To increase the working 
set quota temporarily, use the Adjust 
Working Set fix. 


Check to see whether the node failed 
or there are problems with the LAN 
segment to the node. This event occurs 
when the server no longer receives data 
from the node on which data is being 
collected. 


A densely populated table can result 
in a performance degradation. Use the 
system parameter RESHASHTBL to 
adjust the total number of entries. 
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RESPRS Resource The percentage of occupied entries A sparsely populated table wastes 
hash table in the hash table is less than the memory resources. Use the system 
sparse threshold. parameter RESHASHTBL to adjust the 

total number of entries. 

UEXPLB Using The program library for the Check to see that all the program 
OpenVMS combination of hardware library files exist in the program library 
program architecture and OpenVMS directory. 
export version was not found. 
library 

UNSUPP Unsupported The Data Analyzer does not Check the product SPD for supported 
node support this combination of system configurations. 

hardware architecture and 
OpenVMS version. 

VLSZCH Volume size Informational message to indicate No further investigation is required. 
changed that the volume has been resized. 

WINTRN'~ High window This indicates that current open Defragment heavily used volumes using 
turn rate files are fragmented. Reading BACKUP or a disk fragmentation 

from fragmented files or extending program. For processes that extend 

a file size, or both, can cause a the size of a file, make sure that the 

high window turn rate. file extent value is large. (See the $SET 
RMS/EXTEND_QUANTITY command 
documentation for more information.) 

Table B-2 Windows Events 

Event Description Explanation Recommended Action 

CFGDON - Configuration The Availability Manager has An informational event to indicate that 
done made a connection to the node the node is recognized. No further 

and will start collecting the data investigation is required. 
according to the Customize Data 
Collection options. 

NODATA Unable The Data Analyzer is unable to The performance data is collected by the 
to collect collect performance data from the _—_PerfServ service on the remote node. 
performance node. Check to see that the service is up and 
data running properly. 

NOPRIV Not allowed The Data Analyzer cannot monitor See Chapter 7 for details on setting up 
to monitor the node due to a password security. 
node mismatch between the Data 

Collector and the Data Analyzer. 

PTHLST Path lost The connection between the Data Check if the node crashed or if the LAN 
Analyzer and the Data Collector segment to the node is having problems. 
has been lost. This event occurs when the server no 

longer receives data from the node on 
which data is being collected. 

PVRMIS Packet This version of the Availability The version of the Data Collector is 
version Manager is unable to collect more recent than the Data Analyzer. To 
mismatch performance data from the node process data from the node, upgrade the 


because of a data packet version 
mismatch. 


Data Analyzer to correspond to the Data 
Collector. 
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OpenVMS Events by Types of Data Collections 


This appendix shows the events that can be signaled for each type of OpenVMS 
data collected. The events are categorized as follows: 


e Threshold events (Table C-1) 


e Nonthreshold events (Table C—2) 


Appendix B describes these events in detail and provides recommended actions. 


Note 


Enabling the data collections described in these tables is described in 
Chapter 7. The only exceptions are the events listed under “Process 
name scan” in Table C—1, which are enabled on the Watch Process 
Customization page (see Figure 7—23). 


Table C-1 OpenVMS Threshold Events 


Types of Data 
Collection Event 


Description 


DSKERR 
DSKINV 
DSKMNV 
DSKMTO 
DSKOFF 
DSKRWT 
DSKUNA 
DSKWRV 
WINTRN 


Disk status 


DSKQLN 
LOVLSP 
VLSZCH 


Disk volume 


HIALNR 
HIBIOR 
HICMOQ 


Node summary 


High disk error count 
Disk is invalid 

Disk in mount verify state 
Disk mount verify timeout 
Disk device is off line 
High disk RWAIT count 
Disk device is unavailable 
Wrong volume mounted 


High window turn rate 


High disk queue length 
Low disk volume free space 


Volume size changed 


High alignment fault rate 
High buffered I/O rate 
Many processes waiting in COMO 
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Types of Data 


Collection Event Description 
HICOMQ Many processes waiting in COM 
HIDIOR High direct I/O rate 
HIHRDP High hard page fault rate 
HIMWTQ Many processes waiting in MWAIT 
HINTER High interrupt mode time 
HIPFWQ Many processes waiting in PFW state 
HIPINT High interrupt mode time on Primary CPU 
HIPRCT High process count 
HIPWIO High paging write I/O rate 
HIPWTQ Many processes waiting in COLPG or FPG 
HISYSP High system page fault rate 
HITTLP High total page fault rate 
HMPSYN High multiprocessor (MP) synchronization mode 

time 

HPMPSN High interrupt mode time on Primary CPU 
LOMEMY Free memory is low 

Lock contention LCKCNT Lock contention 
LRGHSH Remote lock hash table too large to collect data 
RESDNS Resource hash table dense 
RESPRS Resource hash table sparse 

Single lock LCKBLK Lock blocking 
LCKWAT Lock waiting 

Single process KTHIMD Kernel thread waiting for inner-mode semaphore 
LOASTQ Process has used most of ASTLM quota 
LOBIOQ Process has used most of BIOLM quota 
LOBYTQ Process has used most of BYTLM quota 
LODIOQ Process has used most of DIOLM quota 
LOENQU Process has used most of ENQLM quota 
LOFILQ Process has used most of FILLM quota 
LOPGFQ Process has used most of PGFLQUOTA quota 
LOPRCQ Process has used most of PRCLM quota 
LOTQEQ Process has used most of TQELM quota 
LOWEXT Low process working set extent 
LOWSQU Low process working set quota 
PRBIOR High process buffered I/O rate 
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Types of Data 


Collection Event Description 
PRBIOW Process waiting for buffered I/O 
PRCCMO Process waiting in COMO 
PRCCOM Process waiting in COM 
PRCCUR Process has a high CPU rate 
PRCMUT Process waiting for a mutex 
PRCPSX Process waiting in PSXFR wait state 
PRCPUL Most of CPULIM process quota used 
PRCPWT Process waiting in COLPG, PFW, or FPG 
PRCQUO Process waiting for a quota 
PRCRWA Process waiting in RWAST 
PRCRWC Process waiting in RWCAP 
PRCRWM Process waiting in RWMBX 
PRCRWP Process waiting in RWPAG, RWNPG, RWMPE, or 
RWMPB 
PRCRWS Process waiting in RWSCS, RWCLU, or RWCSV 
PRCUNK Process waiting for a system resource 
PRDIOR High process direct I/O rate 
PRDIOW Process waiting for direct I/O 
PRLCKW Process waiting for a lock 
PRPGFL High process page fault rate 
PRPIOR High process paging I/O rate 
Process I/O LOBIOQ Process has used most of BIOLM quota 
LOBYTQ Process has used most of BYTLM quota 
LODIOQ Process has used most of DIOLM quota 
LOFILQ Process has used most of FILLM quota 
PRBIOR High process buffered I/O rate 
PRDIOR High process direct I/O rate 
PRPIOR High process paging I/O rate 
Page/swap file LOPGSP Low page file space 
LOSWSP Low swap file space 
NOPGFL No page file 
NOSWFL No swap file 
Cluster summary LOVOTE Low cluster votes 
Memory LOWEXT Low process working set extent 
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Types of Data 


Collection Event Description 
LOWSQU Low process working set quota 
PRPGFL High process page fault rate 
PRPIOR High process paging I/O rate 
CPU process PRCCOM Process waiting in COM or COMO 
PRCCUR Process has a high CPU rate 
PRCMWT Process waiting in MWAIT (See Appendix A for a 
breakdown of MWAIT state.) 
PRCPWT Process waiting in COLPG, PFW, or FPG 


Process name scan NOPROC 
PRCFND 


Specific process not found 


Process has been discovered recently 


Table C-2 OpenVMS Nonthreshold Events 


Type of Data 
Collected Event 


Description 


Application- OPCERR 
level event 


OVOERR 


Data Collection DCCOLT 
event 


DCSLOW 


Node-level event CFGDON 
CHGMAC 
DPGERR 
MINCAP 
NEWMAC 
NOPRIV 
PKTCER 
PKTFER 
PTHLST 


Program library ELIBCR 
error 


ELIBNP 
ELIBUR 
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Failed to send event to OPCOM 


Failed to send event to OpenView 


Data collection completed 


Data collection taking longer than collection interval 


Configuration done 

Changed MAC address 

Error executing driver program 

Capability version below minimum required 
Discovered new MAC address 

Not allowed to monitor node 

Packet checksum error 

Packet format error 

Path lost 


Bad CRC for exportable program library 


No privilege to access exportable program library 


Unable to read exportable program library 
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Type of Data 


Collected Event Description 
NOPLIB No program library 
PLIBNP No privilege to access program library 
PLIBUR Unable to read program library 
UEXPLB Using exportable program library 
UNSUPP Unsupported node 
Events FXBRCT Fix context does not exist on node 
generated by 
fixes 
FXCPKT Received a corrupt fix response packet from node 
FXCRSH Crash node fix 
FXDCPR Decrement process priority fix 
FXDCWS Decrement process working set size fix 
FXDLPR Delete process fix 
FXEXIT Exit image fix 
FXINPR Increment process priority fix 
FXINQU Increment process quota limits fix 
FXINWS Increment process working set size fix 
FXKERR Error executing fix 
FXMVDV Cancel Mount Verify on Disk Volume 
FXMVSM Cancel Mount Verify on Shadow Set Member 
FXNOPR No parameter change with fix to priority 
FXNOQU No quota change with fix to priority 
FXNOWS No working set change with fix to priority 
FXPGWS Purge working set fix 
FXPRIV No privilege to attempt fix 
FXQUOR Adjust quorum fix 
FXRESM Resume process fix 
FXSUSP Suspend process fix 
FXTIMO Fix timeout 
FXUERR Unknown error code for fix 
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Access control lists (ACLs), 1-9 
Adjust AST Queue Limit fix, 6-20 
Adjust Quorum fix, 6—7 
Adjust Working Set fix, 6-17 
AMDS$AM_CONFIG logical name, 1-10 
AMDS$AM_LOG:ANALYZEREVENTS_CONNi_ 

yyyymmdd-hhmm.LOG file 
on OpenVMS systems, 5-5 
MDS$AM_LOGICALS.COM file, 7-10 
MDS$DEVICE, 2-2 
MDS$DRIVER_ACCESS.DAT file, 1-10, 2-2 
MDS$GROUP_NAME, 2-2 
MDS$GROUP_NAME logical name, 7-10 
MDS$LOGICALS.COM file 

setting AMDS$DEVICE, 2-2 

setting AMDS$GROUP_NAME, 2-2 

setting AMDS$RM_DEFAULT_INTERVAL, 

2-2 
setting AMDS$RM_SECONDARY_INTERVAL, 
2-2 

AMDS$RM_DEFAULT_INTERVAL, 2-2 
AMDS$RM_SECONDARY_INTERVAL, 2-2 
AMDS$SYSTARTUP.COM file, 2-1 
AMDS$SYSTARTUP.TEMPLATE file, 2-1 
AMDS$SYSTARTUP_VMS.COM file, 2-3 
AnalyzerEvents.log file 

on Windows systems, 5-5 
APCs (asynchronous procedure calls), 3-10 
ASTLM (AST limit) quota, B-8 
Asynchronous procedure calls (APCs), 3-10 
Asynchronous system traps (ASTs) 

Adjust AST Queue Limit fix, 6-20 
Automatic data collection, 1—16 
AVAIL/ANALYZER command 

to start Availability Manager Data Analyzer, 

2-4 
AVAIL/SERVER command 
to start Availability Manager Data Server, 
2-20 
Availability Manager 

URL, 2-1 
Availability messages 

sent to Data Analyzer, 1-13 


A 
A 
A 
A 
A 
A 


Index 


Background data collection, 1-15 
Blocks 
in use, remaining, 7-17 
Bridging information for routers, 1-3 
Buffered I/O 
byte limit (BYTLM), 3-20 
limit, 3-20 
rate, 3-19, 7-19, B-6 
Buffered I/O (BIO) fix, 6-19 
Byte limit remaining for process I/O, 3-20 


Cc 


Cancel Disk Mount Verification (MV) fix, 6-28 
Cancel Shadow Set Mount Verification (SSM MV) 
fix, 6-29 
Channels 
definition, 4-13 
details, 4-23 
LAN virtual circuit, 4-33, 4-34 
summary data, 4-13 
Circuits 
with individual nodes, 4-2 
Cluster hung fix, 6-3 
Cluster interconnects 
fixes, 6-81, 6-32 
Clusters 
See OpenVMS Clusters 


Collecting data 


See Data collection 
Collection intervals, 1-14, 3-1 
Command procedures 

user action, 7—29 
Commands 

user action, 7—29 
Configuration, 1-4 
Congestion control 

transmitting data, 4-32 
Connection failed state, 2-25 
CPU modes 

OpenVMS, 3-10, 3-11 

Windows, 3-8 


Index-—1 


CPU process states, A-1 Data Analyzer (cont'd) 


OpenVMS, 3-10 starting on OpenVMS Alpha or 164, 2-4 
CPUs (central processing units) starting on Windows, 2-4 
active using localhost name, 2-22 
number active on a node, 3-8 using public keys from Data Server, 2-6, 2-17 
configured Data collection 
number configured to run on a node, 3-8 automatic, 1-16 
improving performance by suspending, 6—13 background, 1-15 
modes, 3-8 changing collection intervals, 7-14 
summary information, 3-8 customization 
process states, A-1 selecting data to collect, 2-34 
process summary, 3-12 customizing settings, 7—11 
setting process priorities, 6-15 default, 2-35 
usage, 3-8 definition of one, 1-15 
wait state, 3-13 events associated with, 1-15 
Crash Node fix, 6-8 foreground, 1-16 
Creating Key Store frequency of, 1-16 
from analyzer system, 2-11 intervals, 1-16 
from server system, 2-7 specifying types, 2-34 
Creating Trust Store state, 2-25 
from analyzer system, 2-12, 2-18 Data Collector 
from server system, 2-9 for DECamds and Availability Manager, 2-1 
Customizing installing from latest kit, 2-3 
access codes, 1-11 nodes, 1-3 
events, 5-8, 7-23 restarting, 7—10 
levels of, 7-4 RMDRIVER, 4-1 
OpenVMS security 
data collection, 7—11 OPCOM log, 1-9 
data filters, 7-14 private LAN transport, 1-6, 1-8 
events, 7-27 security triplets, 1-8 
group membership, 7-10 Data filters 
security features, 7-33 changing values, 7-14 
security features, 7-33 Data packets 
security triplets, 1-11 receipt, 4-31 
Windows transmission, 4—30 
events, 7-27 Data Server 
group membership, 7—10 assessing need to set one up, 2-4 
security features, 7-33 creating key store from analyzer system, 2-11 
creating key store from server system, 2-7 
D description, 1-3 
exporting public key as trusted certificate, 
Data Analyzer 2-13 
adding to trust store on analyzer system, 2-18 exporting public key as trusted certificates, 2-9 
creating trust store on analyzer system, 2-12, generating key pair, 2—7, 2-11 
2-18 purpose, 1-6 
creating trust store on server system, 2-9 set up from analyzer system, 2-11 
event log files, 5-5 set up from server system, 2-6 
exporting public key as trusted certificate, 2-9, starting on OpenVMS Alpha or 164, 2-20 
2-13, 2-15 starting on Windows, 2-20 
generating key pair on analyzer system, 2-11 using ina WAN, 1-7 
generating key pair on server system, 2-8 using to pass data over a WAN, 1-6 
importing public key from trust certificates, DECamds, 1-1 
2-18 changes and enhancements 
importing public key from trusted certificate, no installation of server, 2-1 
2-18 Deferred procedure calls (DPCs), 3-9 
nodes, 1-3 Delete Process fix, 6-11 


passwords, 1-8 
security, 1-8 
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DIOLM (Direct I/O limit), 3-20 
Direct I/O fix, 6-18 
Direct I/O rate, 3-19 
Disk 
fixes, 6-2, 6-27 
Disk fixes, 6-2, 6-3 
Disks 


cancel disk mount verification (MV) fix, 6-28 
cancel shadow set mount verification (SSM MV) 


fix, 6-29 
fixes, 6-3 
OpenVMS 
single disk summary, 3-24 
status summary, 3-22 
summaries, 3-22 
volume summary, 3-25 
Windows 
logical summary, 3-26 
physical summary, 3-27 
Disk status 
filtering data, 7-16 
Display data collection interval, 1-17 


E 


ECS 
criteria, 4-27 
Equivalent Channel Set 
See ECS 
Escalation of events, 7—28 
Event data collection interval, 1-17 
Event escalation, 7—23 
Event pane, 2-35, 5-1 
Events 
definition, 1-14 
displaying information, 5-1, 5-7 
escalation, 7—23 
log files, 5-5 
occurrence value, 7—28 
OpenVMS, B-1 
posting, 1-17 
severity, 5-2, 7-28 
signaling performance problems, 1-14 
testing for, 5-3 
threshold defaults, 4—3 
thresholds, 7—28 
thresholds for posting, 1-18 
user actions, 7—28 
Windows, B—16 
Exit Image fix, 6-12 


F 


File protection 
for security, 1-9 


Filtering data 


methods, 7—1 


Filters 


OpenVMS CPU, 7-15 

OpenVMS disk status, 7-16 

OpenVMS disk volume, 7-17 
OpenVMS I/O, 7-18 

OpenVMS lock contention, 7—19 
OpenVMS memory, 7-20 

OpenVMS page/swap file, 7-21 
specifying types of data to collect, 7-14 


Fixes 


adjusting AST queue limit, 6—20 
adjusting buffered I/O count limit, 6-19 
adjusting creation limit of subprocess, 6—24 
adjusting direct I/O count limit, 6-18 
adjusting I/O byte limit, 6-25 
adjusting lock queue limit, 6-22 
adjusting open file limit, 6-21 
adjusting pagefile quota limit, 6-26 
adjusting quorum, 6—7 
adjusting resource limits, 6-18 
adjusting time queue entry limit, 6-23 
adjusting working set size, 6-17 
cancel disk mount verification (MV), 6-28 
cancel shadow set mount verification (SSM 
MV), 6-29 
changing process priority, 6-15 
CMKRNL privilege required, 6-5 
crashing a node, 6-8 
deleting a process, 6-11 
description, 6-1 
disk, 6-2, 6-3 
exiting an image, 6-12 
LAN checksumming, 6-33 
LAN device 
adjusting priority, 6—42 
setting maximum buffer, 6-43 
starting device, 6-44 
stopping device, 6—45 
LAN path 
adjusting priority, 6-39 
changing hops, 6-40 
LAN virtual circuit 
adjusting maximum receive window size, 
6-35 


adjusting maximum transmit window size, 


6-34 

compression, 6—36 

ECS maximum delay, 6-37 
list of available, 6-1 
memory usage, 6—2, 6-5 
problems and recommended fixes, 6-3 
purging a working set, 6-16 
results, 6-5 
resuming a process, 6-14 
suspending a process, 6-13 


system service calls associated with, 6-1, 6-5 
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Fixes (cont’d) 
types, 6-1 
Foreground data collection, 1-16 


G 


Galaxy ID, 3-8 
Graphical user interface 
See GUI 
Group/Node pane, 2-27 
See also Groups, Nodes 
Groups 
See also Group/Node pane 
changing, 7-11 
displaying, 2-29 
GUI (graphical user interface), 1-1 


H 


Hardware 
security triplet address, 1-11 
Hardware model, 3-7 
Help 
See Online help 
HP OpenView 
configuring on your system, 7—25 
signalling events to, 7-23 
using on your system, 7-26 


V/O (input/output) 
adjusting AST limits fix, 6-18 
current, threshold, and peak values, 3-19 
default data collection, 7-13 
page/swap files, 3-21 
page fault rate, 3-16 
process quotas, 3-39 
rates per process, 3-20 
summary, 3-18 
I/O byte fix, 6-25 
Icons 
colors represent states, 3-2 
IEEE 802.3 Extended Packet format protocol, 1-3 
Increasing resource limits fix, 6-18 
Interrupts per second, 3-9 
Intruder fix, 6-38 
IPID (internal PID), 3-11 


J 


Key and Trust Store 


adding trusted certificates, 2-18 

copying key store, 2-14, 2-15 

created and maintained by Data Analyzer, 2-6 

creating default trust store from analyzer 
system, 2-14 

creating default trust store from server system, 
2-9 

creating key store from analyzer system, 2-11 

creating key store from server system, 2-7 

creating key store - introduction, 2-6 

creating trust store from analyzer system, 
2-12, 2-18 

creating trust store from server system, 2-9 

creating trust store - introduction, 2-6 

default key store name and path, 2-6 

default trust store name and path, 2-6 

exporting public key from key store, 2-9, 2-18, 
2-15 

importing public key into trust store, 2-18 

introduction, 2-6 

opening key store, 2-24 

opening key store from analyzer system, 2-15 

opening key store from server system, 2-8 

opening trust store, 2-24 


Key Pair 


creating, 2—6, 2—7, 2-8, 2-11 
definition, 2-5 

exporting public key, 2-9, 2-13, 2-15 
importing public key, 2-18 

storing in key store, 2-6 


Java GUI, 1-1 
Job quotas in use 

single process, 3-41 
JOB_CONTROL process, 6-16 
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LAN 


See LAN devices, LAN fixes, LAN paths 
displays, 4-1 


LAN channels, 4-2 


details 
counters data, 4-24 
ECS criteria data, 4-27 
errors data, 4-25 
overview data, 4-23 
remote system data, 4-26 
details data, 4-23 
summary data, 4-13 


LAN devices, 4-2 


data displayed, 4-15 
detail data, 4-17 
details 
errors data, 4-21 
events data, 4-20 
overview data, 4-17 
receive data, 4-19 
transmit data, 4-18 


LAN devices (cont’d) 
fixes, 6-41 
overview data, 4-18 
LAN fixes, 6-30 
adjusting device priority, 6—42 
adjusting priority, 6-39 
changing channel hops, 6—40 
ECS Maximum Delay, 6-37 
setting maximum packet size, 6-43 
starting device, 6-44 
stopping device, 6—45 
VC checksumming, 6-33 
VC compression, 6-36 
VC maximum receive window size, 6—35 
VC maximum transmit window size, 6-34 
LAN path (channel) 
fixes, 6-38 
LAN paths, 4-2 
LAN virtual circuits, 4-2 
detailed data, 4-29 
fixes, 6-33 
PEDRIVER, 4-2 
summary data, 4-11 
LAVC 
See NISCA and SCS 
Local area networks (LANs), 1-1 
Localhost name usage in Data Server connections, 
2-22 
Lock block 
data, 3-31 
log file, 3-83 
Lock block log 
location of file, 3-33 
reason for logging, 3-338 
resource name dump, 3-33 
Lock Block Log 
example, 3-34 
Lock contention 
OpenVMS, 3-28, 7-13 
Lock contention page 
data displayed, 3-29 
decoded format, 3-29 
raw format, 3-30 
Lock Contention page 
flags, 3-32 
formats, 3-29 
Lock ID, 3-32 
lock status, 3-29 
modes, 3-32 
number of locks, 3—29 
resource block address, 3-30 
resource names, 3-29 
resource value block dump, 3-30 
state of lock, 3-32 
Lock queue limit fix, 6-22 
Locks 
contention for, 3—28 


Logical disks 

Windows, 3-26 
Logical names 

sending messages to OPCOM, 1-13 
Low memory fix, 6-3 


M 


Managed objects 
support for, 4-1 
Managed objects display, 4-34 


Memory, 3-8 
count, 3-15 
data, 7-20 


default data collection, 7-13 
low memory fix, 6-3 
OpenVMS summary, 3-14 
summaries, 3-13 
total for a node, 3-8 
Windows summary, 3-13 
Memory usage 
displaying, 3-14 
fixes, 6-5 
Menu bar 
in System Overview window, 2-36 
Messages 
sending to OPCOM, 1-13 
Modes 
See CPU modes 
Monitoring processes, 7—36 
Multicast “Hello” messages 
announcing node availability, 1-12, 2-25, 2-32 
controlling rate of messages 
AMDS$RM_DEFAULT_INTERVAL, 2-2 
AMDS$RM_SECONDARY_INTERVAL, 
2-2 
definition, 2-2 
number received from a node, 2-31 
showing number transmitted, 3-3, 4-19 
Mutexes 
held, 3-39 
number in node, 3-6 
MWAIT state 
resource wait table, A-2 


N 


Network address 
security triplet, 1-11 


Network Interconnect for the System 
Communications Architecture 
See NISCA 
Network protocol, 1-2 
NISCA 
LAN channels, 4-2 
LAN devices, 4-2 
LAN paths, 4-2 
LAN virtual circuits, 4-2 
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NISCA transport protocol, 4-29 


Node data 
OpenVMS, 3-3 
summary, 3-7 
Windows, 3-6 

Node pane 


data in OpenVMS display, 3-38 
data in Windows display, 3-6 
Windows, 3-5 
Nodes 
See also Group/Node pane 
adjusting quorum, 6-7 
crashing a node, 6-8 
displaying data, 3-1 
fixes, 6-1, 6-6 
memory usage, 3-14 
OpenVMS, 3-3 
specifying data to collect, 2-34 
summary information, 3-7 
NoEvent data collection interval, 1-17 
Non-managed objects display, 4-33 
Nonpaged pool 
displaying size, 3-14 
NOPROC event, 7-37 
Notifications from Data Collectors, 2-2 


O 


Occurrence counters 
definition, 5-2 

Occurrences 
criterion for posting an event, 1-18 
event, 1-18 

low values, 7-28 

Online help, 2-37 

OPCOM 
signalling events to, 7-23 

OPCOM (Operator Communication Manager) 
sending messages, 1-13 

Open file limit fix, 6-21 

OpenVMS Clusters 
hung, 6-3 
interconnect summary, 4—2 
members data, 4—2, 4-3 
running Availability Manager in, 1-4 
summary data, 4-2, 4-3 

Operator communications manager (OPCOM) 
security log, 1-9 

OS (operating system) version, 3-8 


P 


Packets discarded 

LAN virtual circuit, 4-387 
Page/swap files, 3-21 
Paged pool 

displaying size, 3-14 
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Page faults, 3-19 
Adjust Working Set fix, 6-17 
Purging Working Set fix, 6-16 
rate, 3-16 
Pagefile quota fix, 6-26 
Page files 
data collection, 7-13 
Paging write I/O rate, 3-19 
Pane 
Event, 2-35 
Panes 
changing location of column headings, 3-3 
changing sizes, 2-35 
Group, 2-26 
System Overview window, 1-3, 2-26 
Passwords, 1-8 
changing, 1-10, 7-34, 7-36 
default, 1-9 
security triplets, 1-8 
Path lost state, 2-25 
Performance 
identifying problems, 1-14 
Physical disks 
Windows, 3-27 
PID (process identifier), 3-11, 3-15, 3-20 
PIO (paging I/O) rate, 3-20 
Pipe quota 
for transmitted data, 4-32 
PRCFND event, 7-37 
PRCLM process limit, B-9 
Printing 
display, 2-37 
using Windows Paint program, 2-37 
Private Key 
definition, 2-5 
Private LAN transport security, 1-6, 1-8 
Process 
fixes, 6-9 
Processes 
adjusting limits, 6—2 
execution rates, 3-39 
filtering data, 7-15 
fixes, 6-1 
information about, 3-37 
job quotas, 3-41 
looping process fix, 6-3 
monitoring, 7-36 
privileges, 1-9 
Process Priority fix, 6-15 
quotas 
displaying, 3-39 
Resume Process fix, 6-14 
runaway process fix, 6-3 
single process data, 3-35 
Suspend Process fix, 6-13 
wait states, 3-39 
working sets, 3-38 


Process information SCS 


OpenVMS, 3-37 System Communications Services 
Process limits fixes, 6-2, 6-18 circuits data, 4-2 
Process memory fixes, 6—2 connections data, 4-2 
Process Priority fix, 6-2, 6-3, 6-5, 6-15 LAN channel data, 4-2 
Process quotas LAN path data, 4-2 
adjusting, 6-3 LAN virtual circuit data, 4-2 
displaying data, 3-39 LAN virtual circuits detailed data, 4-29 
Protocol SCS (System Communications Services) 
for routers, 1-4 connections data, 4-8 
IEEE 802.3 Extended Packet format, 1-3 Secure Communications 
network, 1-2 introduction, 2—5 
NISCA transport, 4-29 set up, 2-6 
Public Key Security 
also see Trusted Certificate access control lists (ACLs), 1-9 
copying as trusted certificate, 2-18 changing groups, 7-11 
definition, 2-5 changing passwords, 1-10, 7-34, 7-36 
exporting from key store, 2-9, 2-13, 2-15 data transfer, 1-4 
importing, 2-18 file protection, 1-9 
storing in trust store, 2-6 private LAN transport, 1-6, 1-8 
Purge Working Set fix, 6-16 process privileges, 1-9 
triplets, 1-8, 1-10 
Q using passwords to maintain, 1-8 
Security triplets 
Quotas access verification code, 1-10 
adjusting, 6-3 changing, 1-11 
job, 3-41 description, 1-10 
process I/O, 3-39 files, 1-4 
working set, B-7 format, 1-11 
hardware address, 1-11 
R network address, 1-10 
operation, 1-12 
RADs password, 1-10 
maximum number for a node, 3-8 verifying, 1-13 
Receiving information, 1—4, 1-7 wildcard address, 1-11 
Requesting information, 1-4, 1-7 Semaphores, 3-6 
Resource affinity domains Serial Number, 3-8 
see RADs Single disks, 3-24 
Resource availability Single process 
displaying, 5-1 data, 3-35 
fixes, 6-1 OpenVMS 
Restarting the Data Collector, 7—10 execution rates, 3-39 


job quotas, 3-41 
process I/O quotas, 3-39 
wait states, 3-39 
working set, 3-38 
SMP (symmetric multiprocessing), 3-10 


Resume Process fix, 6-14 
Runaway process 
Process Priority fix, 6—3 
Suspend Process fix, 6-3 


S SNAP 
See 802.3 Extended Packet Format protocol 
SCA Sorting data, 2-35 
LAN virtual circuits detailed data, 4-29 Starting the Availability Manager Data Analyzer, 
summary data, 4-6 2-3, 2-4 
SCA data Starting the Availability Manager Data Server, 
SCS connections data, 4-8 2-20 
Screen Status bar 
capturing, 2-37 in System Overview window, 2-36 
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Subprocess 

adjusting creation limit, 6-24 
Subprocess creation limit fix, 6-24 
Suspend Process fix, 6-3, 6-13 
Swap files 

data collection, 7-13 
SWAPPER process 

displaying, 3-39 

fixes ignored, 6-5 
SYS$STARTUP directory, 2-3 
SYSAPs (system applications) 

See System applications (SYSAPs) 
System applications (SYSAPs), 4-7 
System cache 

displaying size in use, 3-14 
System Communications Architecture 

See SCA 
System Communications Services 

See SCS 
System Overview window 

components, 2-36 

how to display data, 2-36 

how to use, 2-25 

menu bar, 2-36 

panes, 1-3 
System service calls 

associated with fixes, 6-5 


T 


Threads, 3-6 
Thresholds 
criteria for posting an event, 1-18 
events, 7-28 
Timer Queue Entry Limit fix, 6-23 
Title bar 
in System Overview window, 2-36 
Tooltip 
example, 3-2 
explanation, 3-1 
TQELM process limit 
raising in UAF file, B-10 
Trusted Certificate 
copying, 2-18 
creating, 2-9, 2-13 
exporting, 2-15 
importing, 2-18 


U 


Uptime, 3-8 
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User action 
events, 7—28 

User actions 
executing on OpenVMS system, 7-30 
executing on Windows system, 7-31 


V 


Virtual circuits 
LAN 


channel selection data, 4-33, 4-34 


closures data, 4-86 
congestion control data, 4-32 
detailed data, 4-29 
for individual nodes, 4-2 
packets discarded data, 4-37 
receive data, 4-31 
summary data, 4-11 
transmit data packets, 4-30 
Virtual memory 
displaying size, 3-14 
Volume 
default data collection, 7-13 


W 


Wait states 
calculating, 3-40 


CPU, 3-13 
process, 3-39 
WAN 


See wide area network 
Watch Process feature, 7-36 
Wide area network (WAN) 

node configuration, 1-6 

passing data over, 1-6, 1-7 

using with Data Server, 1-7 
Wide Area Network (WAN), 1-3 
Wildcard address 

security triplet, 1-11 
Window turn rate, 3-19 
Working set extent, 3-16 
Working sets 

data, 3-38 

pages, 3-38 

purging, 6-16 

size, 3-15 

size fix, 6-17 

too high or too low, 6-3 


